Posted on by Systems Verification and Validationin
By Sam Procter
Software Solutions Division
As computers become more powerful and ubiquitous, software and software-based systems are increasingly relied on for business, governmental, and even personal tasks. While many of these devices and apps simply increase the convenience of our lives, some--known as critical systems--perform business- or life-preserving functionality. As they become more prevalent, securing critical systems from accidental and malicious threats has become both more important and more difficult. In addition to classic safety problems, such as ensuring hardware reliability, protection from natural phenomena, etc., modern critical systems are so interconnected that security threats from malicious adversaries must also be considered. This blog post is adapted from a new paper two colleagues (Eugene Vasserman and John Hatcliff, both at Kansas State University) and I wrote that proposes a theoretical basis for simultaneously analyzing both the safety and security of a critical system.
Why is this an issue now?
One common way of determining the safety of a critical system is to perform what's called a hazard analysis. There are a number of traditional hazard-analysis techniques; two of the most popular are failure modes and effects analysis (FMEA) and fault tree analysis (FTA). The development of these traditional techniques predates modern levels of interconnectivity, so most versions of the techniques do not explicitly address security concerns. This omission has been a problem in a number of domains ranging from industrial control systems, attacked by Stuxnet; to the smart power grid, which faces a number of challenges; to even personal medical devices that face attacks now that they expose functionality over wireless radios and the Internet.
What's being done about it?
This rise in security threats has led some researchers to adapt traditional hazard-analysis techniques to include or focus on security-related issues. Christoph Schmittner and his colleagues showed how FMEA can be used for security analysis and Philip Brooke and Richard Paige have demonstrated the use of FTA for secure system design. But the field of hazard analysis research is moving in other directions as well. In addition to the inclusion or exclusion of security concerns, a second dimension of hazard analysis is the incorporation of systems theory.
Nancy Leveson from the Massachusetts Institute of Technology is perhaps the biggest proponent for the use of systems theory, which advocates a more holistic approach. Systems theory--as opposed to the analytic-reduction style of analysis used in the traditional scientific method--has been integrated into a new causality model and hazard analysis technique:
Others are working in this area as well: Friedberg et al. developed their own security-and-safety derivative of Leveson's technique called STPA-SafeSec. And, as part of my Ph.D. research before I joined the SEI, I created a refinement of STPA that's focused on hardware- and software-based subsystems called the Systematic Analysis of Faults and Errors (SAFE).
What's New in our Approach?
For this paper, our approach differed from previous efforts in that we were not attempting to prescribe an exact series of steps for analysts to follow when analyzing their systems. Rather, we examined the basis of one of the key elements shared by most analysis techniques: their use of guidewords. Guidewords are high-level concepts or terms that guide analysts to consider particular ways that a system can fail. These exist in both safety-analysis techniques--STPA has its Step 1 and Step 2 terms, FMEA has failure modes, and HAZOP is based around finding deviations from design intent using terms like Late and More--and security-analysis techniques (STRIDE is centered around the terms that make up its eponymous acronym). When most techniques were created, though, the guidewords were also created in an ad hoc manner, rather than being directly traceable to existing literature.
Our work showed how SAFE, which is a guideword-agnostic analysis technique, could be used with a set of terms derived from one of the classic adversary models used in security. That is, guidewords can be supplied to SAFE at runtime as parameters (a concept we refer to in the paper as parametricity), rather than being ad hoc and essentially inseparable. The classic security model we used is the one proposed in 1983 by Danny Dolev and Andrew C. Yao that describes the actions an adversary could potentially take if the analyzed system communicates over a non-trusted network. For many systems, even those that do not use the Internet, this is a reasonable assumption: keeping an entire network perfectly secure is often prohibitively hard. What's more, the attack types that arise from the Dolev-Yao model are so foundational that they can map cleanly (if informally, in this work) to concepts from both system safety and network security. Table 2 shows this mapping:
The adversary described by Dolev and Yao's model controls a compromised component on a network. It can read any message, modify messages before they are received by their intended recipient, delay those messages (possibly indefinitely, effectively dropping the message), and craft/send custom messages to impersonate legitimate users of the network.
We believe that there are a number of benefits to a guideword-based, safety- and security-aware component-focused analysis like SAFE.
In the initial presentation of SAFE, the evaluation was based on an analysis of hazards in a system of interconnected medical devices and governing software. The motivation and details of the distributed medical application aren't germane to this blog post, but a high-level overview is provided in Figure 2. For this work, we adapted the previous analysis from my dissertation: we selected a single element of the system and repeatedly re-analyzed it using SAFE with different guideword sets derived from a range of sources. These included the following:
Our evaluation was based on the likelihood that an analyst, following the process of SAFE, would detect hazards leading to various design improvements. These possible improvements include
Of course, none of these design improvements are particularly novel, but this exercise wasn't intended to come up with clever or unintuitive solutions to subtle problems in the medical system's design. Rather, we were interested in finding a set of guidewords that would consistently suggest the broadest set of improvements.
Guidewords aren't used in a vacuum, and hazard analysis isn't a computerized process. Our evaluation was thus necessarily somewhat subjective--see Table 3 for the full result. We rated whether an analysis would be likely to suggest an improvement (denoted with a "✓"), might suggest the improvement (denoted with a "?"), or would likely not suggest the improvement (denoted with a "✗"). Of course, a particularly skilled or experienced system designer/analyst might come up with the design improvements regardless of the guideword set used; the terms are used only to guide analysts to think about particular classes of errors.
Table 3: Evaluation
In the next few months, we'll explore how some of the foundational ideas from this work can integrate with ongoing projects here at the SEI. One promising direction is the integration of hazard/security analysis and semiformal architecture models such as those built in the SEI's popular Architecture Analysis and Design Language (AADL). Not only does SAFE have an AADL-based implementation, but the SEI has the Architecture-Led Safety Analysis (ALSA) technique. David Gluch, a fellow SEI researcher, and I are looking at how this technique might be adapted to also address security concerns; we expect to produce a technical note here in a few months that describes what we've learned so far.
I'm also particularly interested in automating analyses, so that domain experts can most efficiently leverage their personal expertise and not have to learn a lot of computer science/hazard-analysis theory. To that end, I think the links between this SAFE's style of backwards-chaining analysis and Rushby's assumption synthesis are particularly promising, and I want to continue exploring overlaps in that area as well.