search menu icon-carat-right cmu-wordmark

Heartbleed and Goto Fail: Two Case Studies for Predicting Software Assurance Using Quality and Reliability Measures

Headshot of Carol Woody. Headshot of Bill Nichols.

Mitre's Top 25 Most Dangerous Software Errors is a list that details quality problems, as well as security problems. This list aims to help software developers "prevent the kinds of vulnerabilities that plague the software industry, by identifying and avoiding all-too-common mistakes that occur before software is even shipped." These vulnerabilities often result in software that does not function as intended, presenting an opportunity for attackers to compromise a system.

This blog post highlights our research in examining techniques used for addressing software defects in general and how those can be applied to improve security detection and management.

The errors on Mitre's Top 25 list are both quality problems and potential security problems. Software security generally shares many of the same challenges as software quality and reliability. Consider Heartbleed, a vulnerability in the open source implementation of the secure socket layer (SSL) protocol. At the time Heartbleed was discovered, available software assurance tools were not set up to detect this vulnerability. As we discuss later in this post, however, Heartbleed, could have been found through thorough code inspection.

Assurance and Finding Defects

We define software assurance as demonstrating that software functions as intended and only as intended. Vulnerabilities permit unintended use of software that violates security and are therefore defects. Unfortunately, all software begins with defects, and we have no means to prove that any software is totally free of defects or known vulnerabilities. This ambiguity means that there will always be a risk that software will not function as intended at later times. Likewise, this risk led us to explore whether techniques used for addressing defects in general can be used to improve security defect detection and management.

We began by reviewing detailed size, defect, and process data for more than 100 software development projects amassed through the SEI's Team Software Process (TSP) work. The projects include a wide range of application domains, including medical devices, banking systems, and U.S. federal legacy system replacement. This data supports potential benchmarks for ranges of quality performance metrics (e.g., defect injection rates, removal rates, and test yields) that establish a context for determining `very high quality products. We have the following types of information available for each project:

  • summary data that includes project duration, development team size, cost (effort) variance, schedule variance, and defects found;
  • detailed data that includes (planned and actual) for each project component:
  • size (added and modified lines of code [LOC]), effort by development phase, defects injected and removed in each development phase, and date of lifecycle phase completion.

We analyzed this data to identify baselines for expected numbers of defects of various types and the effectiveness of defect removal techniques. Before we discuss our findings, it is important to note that injecting defects is really an inevitable byproduct of software development. Developers make defects as a natural byproduct of building and evolving software. Removing these defects, however, proves more difficult.

It is hard to eliminate all defects from software. The data we analyzed indicated that defects similar to many known security vulnerabilities were injected during the implementation phase. A few of these software products (five) not only had low levels of defects when released, but also were used in domains that required safety or security. The additional testing or analysis performed on these projects gave us a measured sample of escaped safety or security defects.

We found there is no silver bullet for addressing defects or security vulnerabilities. Among the cases we examined that proved most effective, however, was the application of standard quality techniques, such as documented designs, review of the designs against requirements, code inspections (all performed prior to testing). The projects most effective at defect removal did not rely solely upon testing or static analysis to discover defects. Testing and tools were used as a verification of completeness. In cases where early removal techniques had not been effectively applied, developers were often overwhelmed with responses from their static tools. Conversely, the projects that were applying strong early quality assurance techniques received substantially fewer warnings when using tools, making the follow up easier to manage.

The five projects selected from the SEI's TSP database demonstrated that producing products with few safety or security operational issues requires an integration of quality reviews for defect removal with security or safety-critical reviews. Examinations were done for both code quality and security/safety considerations at each review point in the lifecycle beginning with early design.

As detailed in our technical note, Predicting Software Assurance Using Quality and Reliability Measures, which I co-authored along with Robert Ellison and Bill Nichols, assuring that a software component has few defects also depends on assuring our capability to find those defects. Positive results from security testing and static code analysis are often provided as evidence that security vulnerabilities have been reduced. Recent "goto fail" and Heartbleed vulnerabilities demonstrate, however, that it is a mistake to rely on these approaches as the primary means for identifying defects. As we learned by examining these two vulnerabilities, the omission of quality practices, such as inspections, can lead to defects that can exceed the capabilities of existing code analysis tools. More information on each of these cases is included below.

Case Study: "goto fail" Vulnerability

In 2014, Apple fixed a critical security vulnerability that was likely caused by the careless use of "cut and paste" during editing. The programmer embedded a duplicate line of code that caused the software to bypass a block of code that verifies the authenticity of access credentials. Researchers discovered this security flaw in iPhones and iPads; Apple confirmed that it also appeared in notebook and desktop machines using the Mac OS X operating system. The vulnerability is described in the National Vulnerability Database as follows:

  • Impact: An attacker with a privileged network position may capture or modify data in sessions protected by SSL/TLS.
  • Description: Secure Transport failed to validate the authenticity of the connection. This issue was addressed by restoring missing validation steps.

This vulnerability allowed attackers to use invalid credentials to gain access to any information on the targeted device such as email, financial data, and access credentials to other systems and devices. A variety of standard quality techniques, such as a personal review by the developer or a more formal peer review, should have identified the defect for removal. A number of structured development techniques, if applied consistently, could also have identified and possibly prevented implementation of the security coding defect that led to the vulnerability, including

  • designing code to minimize branching and make it more predictable and testable
  • architecting the design specification to become the basis for verifying code, including but not limited to, requirements analysis, and test

While these techniques are excellent strategic recommendations to improve quality in general, they cannot prevent careless mistakes. The same caveat would apply to recommendations such as (1) provide better training for the coders, (2) use line-of-code coverage test cases, or (3) use path-coverage test cases.

Using static analysis to identify dead code could have flagged this defect, but not all such mistakes result in truly dead code. It is better to find and remove these problems during a personal review, an informal peer code review, or a formal peer code inspection.

Case Study: Heartbleed Vulnerability

The Heartbleed vulnerability occurred in the OpenSSL "assert" function, which is the initiator of a heartbeat protocol to verify that the OpenSSL server is live. OpenSSL is an open-source implementation of the secure socket layer (SSL) and transport layer security (TLS) protocols used for securing web communications. The assert function sends a request with two parameters, a content string (payload) and an integer value that represents the length of the payload it is sending. If the OpenSSL connection is available the expected response is a return of the content string for the length specified.

The protocol assumes that the requested length of the payload returned is less than 65,535 and less than or equal to the payload length, but those assumptions are never verified by the responding function. A consequence of a violation of either of these limitations is that the request can trigger a data leak. Rather than a buffer overflow, this leak is what is called an over-read. The security risk is that the additional data retrieved from the server's memory could contain passwords, user identification information, and other confidential information.

The defect appears to have been accidentally introduced by an update in December 2011. OpenSSL is a widely used and free tool. At the disclosure of Heartbleed, approximately 500,000 of the internet's secure web servers certified by trusted authorities were believed to be vulnerable to the attack. The new OpenSSL version repaired this vulnerability by including a bounds check to ensure that the payload length specified by a developer is not longer than the data that is actually sent. Unfortunately, that check is only the start of an implemented correction because elimination of the vulnerability requires the 500,000 users of this software to upgrade to the new version. In addition, because this problem is related to security certificates, protecting systems from attacks that exploit the Heartbleed vulnerability requires that companies revoke old SSL certificates, generate new keys, and issue new certificates.

IEEE's Security and Privacy article Heartbleed 101 provides an excellent summary of why this vulnerability was not found sooner, even with the use of static analysis tools. The designer of each static analysis tool has to make trade-offs among the time required for the analysis, the expert help required to support the tool's analysis, and the completeness of the analysis. Most static analysis tools use heuristics to identify likely vulnerabilities and to allow completion of their analysis within useful times. The article goes on to explain that while static analysis tools can be effective for finding some types of vulnerabilities, the complexity of OpenSSL (including multiple levels of indirection and other issues) exceeded the capabilities of existing tools to find this type of vulnerability.

While the OpenSSL program is complex, the cause of the vulnerability is simple. The software never verified the design assumption that the length of the content to be returned to the caller was less than or equal to the length of the payload sent. Verifying that the input data meets its specifications is a standard activity performed for quality, not just for security.

The associated software errors that led to the "goto fail" and Heartbleed vulnerabilities should have been identified during development. These two examples highlight the fact that quality practices, such as inspections and reviews of engineering decisions, are essential for security. For example, testing and code analyzers, must be augmented by disciplined quality approaches.

Improve Security with Quality

Our research suggests that implementing systems with effective operational security requires incorporating both quality and security considerations throughout the lifecycle. Predicting effective operational security requires quality evidence and security expert analysis at each step in the lifecycle. If defects are measured, it follows that from 1 percent to 5 percent of them should be considered vulnerabilities. Likewise, when security vulnerabilities are measured, then code quality can be estimated by considering them to be 1 percent to 5 percent of expected defects.

Further evaluation of systems is needed to see if the patterns suggested by our analysis continue to hold. We explored several options to expand our sample size, but found limited data about defects and vulnerabilities assembled in a form that could be readily analyzed. This analysis must be done for each unique version of a software product. At this time, evaluation of each software product requires careful review of each software change and reported vulnerability. The review not only matches up defects with source code versions, but also reviews each vulnerability reported against the product suite to identify defects specific to the selected product version by parsing available description information and identifying operational results for the same source code. Collection of data about each product in a form that supports automation of this analysis would greatly speed confirmation.

We welcome your feedback on our research. Please leave comments below.

Additional Resources

To read our technical report, Predicting Software Assurance Using Quality and Reliability Measures, please click here.


Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed