Redemption: A Prototype for Automated Repair of Static Analysis Alerts

Heuristic static analysis (SA) tools are a critical component of software development. These tools use pattern matching and other heuristic techniques to analyze a program’s source code and alert users to potential errors and vulnerabilities. Unfortunately, SA tools produce a high number of false positives: they can produce one alert for every three lines of code. By our analysis, it would take a user more than 15 person-years to manually repair all the alerts in a typical large codebase of two million lines of code. Currently, most software engineers filter alerts and only fix the ones they deem most critical, but this approach risks overlooking real issues. False positives create a barrier to the adoption and utility of heuristic SA tools, increasing the possibility of security vulnerabilities.

Our new open source tool Redemption leverages automated code repair (ACR) technology to automatically repair SA alerts in C/C++ source code. By reducing the number of false positives, we estimate organizations can save around seven and one-half person-years in identifying and repairing security alerts.

In this post, I give an overview of how Redemption uses ACR technology, the kinds of errors Redemption can fix, how the tool works, and what’s next for its development.

Redemption: An Overview

Automated Code Repair

The SEI has longstanding research interests in ACR and its applications. You can think of ACR for static alerts like a programmer’s spell checker: the ACR identifies errors and offers a possible repair. The developer can then choose whether or not to implement the suggestion.

In our use of ACR in Redemption, we have followed three basic development principles. First, in contrast to ACR, Redemption does not detect alerts of its own; it simply parses the alerts from other SA tools. Second, even if an alert is a false positive, repairing the alert should not break the code, such as causing the program to crash or fail a valid test case. Third, Redemption is idempotent. That is, the tool doesn’t modify code it has already repaired. We follow these principles to ensure that Redemption produces sound fixes and doesn’t break good code.

Static Analysis Tools and Error Categories

Redemption is not a SA tool; you need to have a separate SA program in your workflow to use Redemption. Currently, Redemption works with three SA tools, clang-tidy, Cppcheck, and rosecheckers, though we’d like to add additional tools as we develop Redemption further.

As we began to work on Redemption, we needed to narrow down the alert categories we wanted to focus on first, since SA alerts are so numerous. We ran SA testing on the open source projects Git and Zeek to determine which errors seemed the most prominent. Our testing generated more than 110,000 SA alerts for the two projects, giving us a broad sample to analyze. We chose three common alert categories to start, and we intend to expand to additional categories in the future. These categories include:

null pointer—a pointer that doesn’t refer to a valid object
uninitialized value—a resource has not yet been initialized
dead code—code that can never be executed

Code weaknesses that fall into these categories are security vulnerabilities and may cause the program to crash or behave unexpectedly. Of the 110,000 alerts, approximately 15,000 were in these three categories. Our initial goal is to repair 80 percent of alerts in these categories.

Continuous Integration Workflows

A top priority for our DoD collaborators is integrating Redemption into their continuous integration (CI) pipelines. A CI server automatically and frequently builds, tests, and merges software, immediately reporting build failures and test regressions. This process makes it easier for teams to catch errors quickly and prevents major merge conflicts. CI workflows typically include testing, including SA tests.

To integrate Redemption into a CI pipeline, we added the tool as a plugin to an instance of Gitlab. Redemption reads the output of an SA tool, produces possible fixes, and creates a pull request, also known as a merge request (MR). The developer can then choose to merge the request and implement the suggestions, modify the MR, or reject the proposed fixes.

By bringing Redemption into a CI pipeline, teams can integrate the tool with SA software they’re already using and create safer, cleaner code.

Figure 1: An automatic repair tool in a CI pipeline

Testing Redemption

Before making Redemption available to our collaborators and the wider public, we needed to make sure the tool was viable and behaving as expected. We tested it throughout the development process, including the following:

regression testing—checks that each improvement to the tool doesn’t break previously working test cases
stumble-through testing—verifies that the repair tool doesn’t crash or hang. The tool was tested on all alerts in all codebases, and the test failed if the tool crashed, hung, or threw exceptions.
sample alert testing—ensures repairs are satisfactory, verified by developers. Since we generated more than 15,000 alerts, we had to choose random samples of alerts to check repairs.
integration testing—checks that the repairs didn’t change the code behavior, such as causing the code to crash or fail a valid test case
performance testing—ensures repairs don’t significantly impede time or memory performance
recurrence testing—verifies that repaired alerts aren’t re-reported or re-repaired

This testing ensured that the tool performed reliably and safely for our collaborators and broader user base. Now that we’re confident that Redemption can meet these standards, we’ve begun to work with our collaborators to integrate it into their software development workflows.

Redemption in Action

To see Redemption in action, you can view or fork the code available in our GitHub repository. (Note that, in addition to an SA tool, Redemption requires Docker as the code runs inside a container.)

Figure 2: A diagram of Redemption's workflow

At a high level, Redemption works by following these steps:

An SA tool checks the code for any potential errors. A file is generated containing the SA alerts.
The file is converted to a JSON format that Redemption can read.
Redemption’s “Ear” module parses the code into an Abstract Syntax Tree (AST).
Redemption’s “Brain” module identifies which repairs to make.
Redemption’s “Hand” module turns these repair plans into patches.

The image below shows the difference between the initial output from an SA tool in red and the repairs from Redemption in green. In this case, Redemption has added checks for a null pointer to repair potential null pointer dereference errors. Redemption has also initialized some uninitialized variables. From here, a developer can choose to apply or reject these patches.

Figure 3: Repaired code after running Redemption

Expanding Redemption to Additional CI Pipelines

What’s next for Redemption? As we move into the next phases, we have identified several areas for further development. As I noted above, we would like to add support for additional SA tools, and we plan to increase the number of repair categories from three to ten, including repairs of integer overflows and ignored function return values. As we expand the repair categories, we can also repair more types of defects, like indentation errors.

We also see potential to support additional tools in CI workflows. For example, future development could include support for more IDEs. Redemption currently works with Gitlab, but additional CI pipelines could be included. If you’d like to help with any of this work, we welcome code repairs and other contributions to the Redemption codebase on GitHub.

Software Engineering Institute

SEI Blog