Accelerating Safety Assurance

The safety of software-reliant systems becomes critical when they interact with the physical world, leading to potentially life-threatening consequences. Aircraft, cars, trains, nuclear reactors, certain medical devices, and other physical objects rely on or are controlled by software. Systems in which software components closely interact with physical processes are known as cyber-physical systems. In this blog post, I discuss an SEI project that has shown promising results in assuring the safety of cyber-physical systems.

The consequences of failure in a cyber-physical system can be fatal, so certification authorities, such as the Federal Aviation Administration (FAA) and the Food and Drug Administration (FDA), require a high level of confidence from the methods used to assure safe functioning of these systems. These certification authorities have recognized the value provided by formal methods to provide a high level of assurance. However, formal methods not only don’t scale to large systems, they also tend to focus on only some, but by no means all, of the relevant aspects of a system.

For example, in the passenger airbag of a car, the software must inflate the airbag every time it detects a crash and there is an adult in the seat. A formal method that looks only at the value transformations in the software would verify that when the software receives as an input crash=true and adult=true, the output would always be airbag-inflation=true. Unfortunately, this analysis is insufficient to ensure safety: If the output occurs 10 minutes after the crash, it is obviously too late. It is also necessary to ensure that the software finishes by the required time—20 milliseconds after the crash, leaving enough time for the inflation process to complete, say, 15 milliseconds after the crash.

The SEI has been working on the Rapid Certifiable Trust (RCT) project to address the scalability of formal methods and the interactions among the different aspects of these methods. Core to the approach developed in the RCT project is the concept of a trusted enforcer. I will first explain the concept of enforcer and later discuss how to make it a trusted enforcer.

An enforcer is a piece of code that is added to a system to monitor the output and to ensure that it is safe. For example, to ensure that the flight of a drone does not go beyond a certain area, also known as a virtual fence, an enforcer can monitor the commands sent to the drone to validate that it will not leave its virtual fence. If the enforcer determines that the drone would leave the virtual fence, then it overwrites the commands to prevent it from doing so.

Scalability

One of the challenges formal methods of verification face today is scalability. Formal methods of verification cannot scale to verify the safety of a system of any practical size. Fortunately, with our approach, it is not necessary to verify an entire system—just the enforcer. An enforcer will not overwrite the commands of the system unless the system violates the safety condition that the enforcer is guarding. This limitation is the key to scaling formal methods to verify the safety of cyber-physical systems.

We are developing verification methods that allow us to verify only the enforcer code and only for the safety condition of interest. More specifically, we can verify that the enforcer can detect any potential system behavior and prevent the system from violating the safety condition, such as the drone leaving the virtual fence. If you can verify the enforcer, you do not need to verify the safe behavior of the rest of the system to ensure safe system behavior. By verifying only the enforcer, formal methods, which cannot evaluate an entire system, can keep an entire system safe.

Trust

Verifying the enforcer would make it trustworthy as long as the unverified part of the system does not interfere with the enforcer. Three types of interference can potentially modify the behavior of the enforcer:

Memory interference—Commonly considered in cybersecurity, memory interference occurs in cases where a misbehavior of unverified software can modify the code, data, or stack of a verified enforcer or the software it depends on, such as the operating system. Clearly, modifying the code of the enforcer invalidates its correctness.
Temporal interference—Unverified software can delay execution of verified software, and hence, the enforcer’s actions may occur too late to be effective. For example, if the unverified software delays the issuing of drone commands for too long, the enforcer would not be able to check if the commands would take the drone outside of the virtual fence—or at least not until it is too late, when the drone might already be outside of the fence. Temporal interference is even more dangerous in cyber-physical systems with shorter, more critical timeframes, such as airbag deployment systems.
Physical interference—Unverified software can change something in the physical environment that may compromise the capacity of the enforcer to correct the situation. For example, something in the physical environment could be able to flip the drone before the enforcer has a chance to correct it.

Real-Time Mixed-Trust (RTMT) Computation

To guarantee trust in an enforcer, the RCT team at the SEI has developed a framework that ensures that the misbehavior of the unverified parts cannot affect the enforcers. We call this framework the real-time mixed-trust (RTMT) computation.

RTMT allows verified and unverified software components to interact with one another and regulates this interaction to ensure that the verified component behavior is not compromised by any of the three interference types described above. This ability requires the verification techniques to also consider the potential misbehaviors of the unverified components.

Runtime Infrastructure

RTMT executes verified enforcers within a verified hypervisor (HV) known as uber eXtensible Micro-Hypervisor Framework (uberXMHF), and the unverified software in a virtual machine (VM) running Linux on top of the HV. The HV is verified to guarantee that software running in the VM cannot access the memory of the HV, ruling out any potential memory interference and providing a well-defined communication protocol between the VM and the HV.

To protect against temporal interference, we designed temporal enforcers that are triggered if the unverified software takes too long to complete. In such a case, the temporal enforcer sends a default safe action to guarantee safety (such as a hover command in the drone example). We designed the components in the VM as a set of threads that execute periodically to keep monitoring and reacting to changes in the physical environment (e.g., stability of the drone, distance from the virtual fence). These components are then given turns to execute in the processor based on their scheduling priority.

Each thread in the VM has an associated temporal enforcer in the HV that is waiting for the VM thread output. If the output is produced before a specific deadline, it is evaluated to see if its values are safe (e.g., the drone will remain within the fence). If no output is produced by the deadline, the enforcer is executed to generate the default action. The temporal enforcers also take turns executing in the same processor using an independent but coordinated scheduler within the HV. Each pair of VM thread and temporal enforcer forms an entity that we call mixed-trust task because it combines an untrusted part running in the VM thread and a trusted part running as a temporal enforcer in the HV.

To protect against physical interference, a model of differential equations evaluates the physical effects that a potential misbehavior of the unverified component can have on the system. This model then calculates limits on the actions that the software is allowed to send to the actuators (e.g., how much torque we are allowed to send to the propellers of the drone).

Verification

The protections offered by the uberXMHF HV with respect to memory interference are verified with assume/guarantee and weakest preconditions techniques offered by a tool called Frama-C. The HV is fully verified only assuming the basic behavior of the processor instructions (instruction set) and the secure bootstrapping mechanisms of the processor, such as the Trusted Platform Module (TPM), which verifies the integrity of all the code that executes when the machine is turned on. For more information about the verified HV, see “Have Your PI and Eat it Too: Practical Security on a Low-Cost Ubiquitous Computing Platform.”

From the timing perspective, we developed timing-verification algorithms to verify two properties:

The untrusted VM threads execute without exceeding their defined worst-case execution time (WCET), and they finish before their deadlines expire—in the example with the airbag, 15 milliseconds after the crash.
If the untrusted VM threads exceed their WCET, the corresponding enforcer can complete its execution before the required deadline—the same 15 milliseconds in the airbag example.

We developed these verification algorithms together with the runtime infrastructure and presented them along with a sample drone application at the 2019 IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). For more detailed information, see our presentation on the verification of the coordination protocol between the VM threads and their temporal enforcers at the Real-Time Network and Systems Conference, 2021. We also developed an extension to the runtime and the analysis algorithms to take advantage of the occasions when the temporal enforcers do not need to run.

We used a technique known as Lyapunov functions to analyze the verification of the effects that the unverified components can have on the physical environment. Lyapunov functions allow us to define a set of possible physical configurations (e.g., positions, velocity, acceleration of a drone) that a system can reach from an initial configuration for which our trusted enforcer can recover and return to the initial configuration. These configurations allow us to figure out how frequently the enforcer needs to monitor the output of the system or, for example, to let it get closer to the border, such as to a virtual fence. We used these techniques in an Office of Naval Research (ONR) project where we investigated use of a software refresh, or fast reboot, to get rid of potential intruder code in a cyberattack while continuing to conduct the mission.

Results and Applications

The results from the RCT project, in particular the RTMT computation and verification framework, allow us to create systems that can be proven correct with respect to specific safety properties. This capability allows these systems to operate even when untrusted components misbehave, either because of bugs or as a result of cyberattacks. We have run successful experiments in the extreme case where the entire VM is compromised and stops operating completely—for example, the Linux kernel crashes—but the safe behavior of the system is preserved (the drone remains hovering).

In the future, we plan to develop techniques for distributed and networked systems and the special needs of AI systems.

Software Engineering Institute

SEI Blog