Data-Driven Software Assurance

As recent news headlines about Shellshock, Sony, Anthem, and Target have demonstrated, software vulnerabilities are on the rise. The U.S. General Accounting Office in 2013 reported that "operational vulnerabilities have increased 780 percent over the past six years." These vulnerabilities can be hard and expensive to eradicate, especially if introduced during the design phase. One issue is that design defects exist at a deeper architectural level and thus can be hard to find and address. Although coding-related vulnerabilities are preventable and detectable, until recently scant attention has been paid to vulnerabilities arising from requirements and design defects.

In 2014, the IEEE Computer Society Center for Secure Design was established to "shift some of the focus in security from finding bugs to identifying common design flaws--all in the hope that software architects can learn from others' mistakes." "We believe that if organizations design secure systems, which avoid such flaws, they can significantly reduce the number and impact of security breaches," the center states in its report Avoiding the Top 10 Security Design Flaws. On a separate front, a group of researchers from various disciplines within the Carnegie Mellon University Software Engineering Institute recently came together to explore the implications of design-related vulnerabilities and quantify their effects on system cost and quality. This post highlights key issues and findings of our work.

Foundations of Our Work

According to a report issued by the National Institute of Standards and Technology (NIST), "the cost benefits of finding and addressing defects early are staggering. For every $1 spent on addressing defects during the coding phase of development, it will cost an organization $30 dollars to address if detected in production."

The economic consequences of vulnerabilities generally fall into two general types:

Harm caused. Breaches are costly and cause loss of security, mission failures, theft of resources (including intellectual property and personal information), and hard-to-recover consumer confidence and trust.
Fixing the problem. The time and cost expended to address known vulnerabilities and recover from breaches continues to increase at a pace that is faster than our ability to recruit and develop individuals having the necessary cybersecurity expertise. Growth trends indicate that unless steps are taken to address this issue, there will be a dearth of staff with the skills needed to identify vulnerabilities and deploy needed patches in the future.

The SEI team worked to identify root causes in the requirements and design phases of the software development lifecycle. The team included researchers from two separate divisions within the SEI: one that is software engineering and acquisition practice focused (within the Software Solutions Division); the other focused on cyber threats and vulnerability analyses in the operational environment (within the CERT Division). These two disciplines are frequently disconnected from each other during development, which is one of the contributing factors that cause vulnerabilities to be overlooked early in the lifecycle. For example, while software developers typically focus on defects, the operations team homes in on vulnerabilities.

The software development side of our team included William Nichols, an expert in the Team Software Process (TSP) and process measurement. Likewise, Julia L. Mullaney, of CERT, is also a TSP expert. We also worked with two vulnerability analysts from CERT: Michael Orlando and Art Manion. Andrew Moore, a researcher in the CERT Insider Threat Center and an expert on system dynamics, also contributed to our effort.

The team wanted to highlight sound requirements-gathering and design practices regarding security. Such practices enable software developers to make more-informed decisions early in the software development lifecycle and thereby reduce the level of vulnerabilities released into production where they are much more costly to address.

Our research pursued three objectives:

gain a better understanding of the state of research on vulnerabilities originating in software requirements and design
leverage the extensive data collected by the TSP team indicating where in the lifecycle defects were inserted and what methods and practices were being used
develop an economic model demonstrating the impact of vulnerabilities introduced during the requirements and design phases

Validating our Premise

Early in our research, we reviewed key published literature on predicting security vulnerabilities in software. We focused on research into early indicators of vulnerabilities, such as what is known and when about potential vulnerabilities that might be actionable.

We decided to conduct a systematic mapping study, which is a study of all the studies that exist on a topic. Mapping studies typically consist of the following four stages:

identify the primary studies that may contain relevant research results
conduct a second evaluation to identify the appropriate studies for further evaluation
where appropriate, perform a quality assessment (examining for such issues as bias and validity) of the selected studies
summarize results along a dimension of interest

What we found is that, with few exceptions, there has been little coordinated or sustained effort to study design or requirements-oriented vulnerabilities.

As detailed in the SEI technical report on this project, Data Driven Software Assurance: A Research Study, our team of researchers first wanted to validate our premise that there were vulnerabilities that occurred during requirements and design activities (more precisely, during requirements elicitation and analysis; and during architecture, design, and analysis). Our team also wanted to verify that these vulnerabilities were as serious as some of the more common coding-based vulnerabilities and that these had significant economic impact. In 2012, our team of researchers investigated vulnerabilities collected in the CERT vulnerability database, which, at the time, contained more than 40,000 cases. Specifically, we created a heuristic based on recurring keywords to eliminate coding-related vulnerabilities:

VulNoteInitialDate is after 01/01/1970 and
field Name does not contain overflow and
field Name does not contain XSS and
field Name does not contain SQL and
field Name does not contain default and
field Name does not contain cross and
field Name does not contain injection and
field Name does not contain buffer and
field Name does not contain traversal

From the resulting vulnerabilities, we next excluded reports of vulnerabilities that lacked sufficient information to determine a cause or had strong indications of implementation-related vulnerabilities. Of those that remained, the team completed an initial root cause analysis on each of the vulnerabilities to confirm that they are, in fact, likely to have been caused by requirements or design defects.

From that list, we selected three vulnerabilities on which to conduct a detailed analysis. What follows is a brief analysis of the one of the requirements or design-related vulnerabilities that we identified from the CERT database, Vulnerability Note VU#649219 SYSRET 64-bit operating system privilege escalation vulnerability on Intel CPU hardware.

What follows is the original CERT description of the vulnerability and its impact:

Description. Some 64-bit operating systems and virtualization software running on INTEL CPU hardware are vulnerable to a local privilege escalation attack. The vulnerability may be exploited for local privilege escalation or a guest-to-host virtual machine escape. A ring3 attacker may be able to specifically craft a stack frame to be executed by ring0 (kernel) after a general protection exception (#GP). The fault will be handled before the stack switch, which means the exception handler will be run at ring0 with an attacker's chosen RSP, causing a privilege escalation.
Impact. This security vulnerability affects 64-bit operating systems or virtual machine hypervisors running on Intel x86-64 CPUs. The vulnerability means that an attacker might be able to execute code at the same privilege level as the operating system or hypervisor.

When running a standard operating system, such as Linux or Windows, or a virtual machine hypervisor, such as Xen, a mechanism is needed to rapidly switch back and forth from an application, which runs with limited privileges, to the operating system or hypervisor, which typically has no restrictions. The most commonly used mechanism on the x86-64 platform uses a pair of instructions, SYSCALL and SYSRET. The SYSCALL instruction does the following:

â€¢ copies the instruction pointer register (RIP) to the RCX register
â€¢ changes the code segment selector to the operating system or hypervisor value

A SYSRET instruction does the reverse; that is, it restores the execution context of the application. There is more saving and restoring to be done--of the stack pointer, for example--but that is the responsibility of the operating system or hypervisor.

The difficulty arises in part because the x86-64 architecture does not use 64-bit addresses; rather, it uses 48-bit addresses, which gives a 256 terabyte virtual address space that is considerably more than is used today. The processor has 64-bit registers, but a value to be used as an address must be in a canonical form; attempting to use a value not in canonical form results in a general protection (#GP) fault.

The implementation of SYSRET in AMD processors effectively changes the privilege level back to the application level before it loads the application RIP. Thus, if a #GP fault occurs because the restored RIP is not in canonical form, the CPU is in application state, so the operating system or hypervisor can handle the fault in the normal way. However, Intel's implementation effectively restores the RIP first; if the value is not in canonical form, the #GP fault will occur while the CPU is still in the privileged state. A clever attacker could use this to run code with the same privilege level as the operating system.

Intel stated that this is not a flaw in its CPU since it works according to its written spec. However, the whole point of the implementation was to be compatible with the architecture as defined originally by AMD. Quoting from Rafal Wojtczuk, "The [proximate] root cause of the vulnerability is: on some 64 bit OS, untrusted ring3 code can force the kernel to execute SYSRET instruction that would return to a non-canonical address. On Intel CPUs, this results in an exception raised while still in Ring0. This exception cannot be handled safely." (Edited to clarify that this is an attribution of a more immediate (or proximate) root cause.)

Clearly, many operating system and hypervisor vendors with considerable market presence were affected. Multiple parties could have prevented the vulnerability because Intel's SDM is very clear on the behavior of SYSRET (and not every x86-64-based operating system or hypervisor was affected). For example, they could have adopted a safer transition back to the application following a SYSCALL. While originally noted and reported by the Linux community back in 2006, the vulnerability was characterized and easily dismissed as a Linux-specific issue. Also from Wojtczuk, "This is likely the reason why developers of other operating systems have not noticed the issue, and they remained exploitable for six years." Intel could also have prevented the vulnerability by not introducing a dangerous re-interpretation of how to return from a rapid system call.

Solution

The references above coupled with the short time window for designing, implementing, and releasing a resolution to the vulnerability (from April to June 2012) might give the impression that the software community easily found an alternative, safer way to handle SYSRET (e.g., return other than through SYSRET or check for a canonical address). Implementing a safer method, however, was not so straightforward. That perhaps the same patch/approach might not work for all affected operating systems can be seen in the different ways the vulnerability can be exploited for different operating systems. So, each vendor must conduct its own careful analysis of what computing assets are at risk or can be leveraged for an exploit and carefully redesign/code system calls/returns to ensure safe transition from application to system and back again. Also, the intent of SYSCALL/SYSRET is to reserve these calls for operating system-only tasks but for which execution performance is critical (e.g., by minimizing saving off registers, except for those actually needed by the system function being called). Thus, the operating system-specific patch(es) need to be designed and coded for execution speed as well as safe transition.

One of the vendors, Xen, has been particularly revealing relative to the considerable difficulties it encountered in working with select stakeholders to diagnose, design, code, and test patches for VU#649219, including providing a detailed timeline that describes an enormous amount of coordination and analysis behind the scenes, giving rise, no doubt, to enormous frustration.

A detailed analysis of the three vulnerabilities is included in the appendices of our report.

Developing a Systems Dynamic Economic Model

After conducting a detailed analysis of the vulnerabilities, we next leveraged information using our knowledge of Team Software Process (TSP). Created by Watts Humphrey, TSP guides engineering teams that are developing software-intensive products to establish a mature and disciplined engineering practice that produces secure, reliable software in less time and at lower costs. Our aim in constructing an economic model was to allow people to study systems with many interrelated factors using stocks and flows (dynamic simulation). In creating a simulation model, we first wanted to represent the normal behavior of the system and then change a few assumptions to see how the model's responses change.

Creating an economic model using the systems dynamics method, which is detailed in Business Dynamics: Systems Thinking and Modeling for a Complex World by John D. Sternam, enables analysts to model and analyze critical behavior as it evolves over time within socio-technical domains. A key tenet of this method is that the dynamic complexity of critical behavior can be captured by the underlying feedback structure of that behavior.

Using Vensim, a dynamic simulation tool, we created a model that represents the design vulnerability lifecycle and includes variables representing key design and defect-related parameters gleaned from the literature search, the detailed vulnerability analysis, and experience with the TSP process.
It is important to note that we did not calibrate the model with any one organization's specific data. To make the most use of the economic model, one would need to calibrate it with an organization's specific data. This model is not usable (transitionable) as is, except to make a hypothetical argument as to why design practice is important.

Wrapping Up and Looking Ahead

Our research confirmed that the current ship-then-fix approach to software quality is sub-optimal and in the long term untenable. Our analyses of vulnerabilities included examples in which vulnerabilities could never be fully eradicated from the user community once the product was distributed.
Moreover, the system dynamics model that we developed showed that even at the level of a single development increment, the economics often favor earlier attention to security-related requirements and design, as well as ongoing validation. In other words, it is often not necessary to consider longer time scales to experience benefits that exceed the costs, for all major stakeholders.

Looking ahead, we would be interested in piloting and calibrating our economic model with an organization that has quality data on its defects including where they originated. If you are an organization interested in piloting this economic model, please send an email to info@sei.cmu.edu.

We welcome your feedback on our research in the comments section below.

Additional Resources

To read the SEI technical report, Data-Driven Software Assurance: A Research Study, please visit
https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=90086.

To read the recently-released Avoiding the Top 10 Software Security Design Flaws, published by the IEEE Computer Society Center for Secure Design, please visit
https://www.spinellis.gr/pubs/tr/2014-CSD-Flaws/html/CSD-avoid-top10-flaws.pdf.

To read the paper, Matching Attack Patterns to Security Vulnerabilities in Software-Intensive System Designs, by Dr. Laurie Williams and Michael Gegick, please visit
https://collaboration.csc.ncsu.edu/laurie/Papers/ICSE_Final_MCG_LW.pdf.

Software Engineering Institute

SEI Blog