Ensuring Safety in Cyber-Physical Systems

In some key industries, such as defense, automobiles, medical devices, and the smart grid, the bulk of the innovations focus on cyber-physical systems. A key characteristic of cyber-physical systems is the close interaction of software components with physical processes, which impose stringent safety and time/space performance requirements on the systems. This blog post describes research and development we are conducting at the SEI to optimize the performance of cyber-physical systems without compromising their safety.

Cyber-physical systems are often safety-critical since violations of their requirements, such as missed deadlines or component failures, and may have life-threatening consequences. For example, when a cyber-physical system in a car detects a crash, the airbag must inflate in less than 20 milliseconds to avoid severe injuries to the driver. Industry competitiveness, along with the urgency of fielding cyber-physical systems to meet rapidly evolving Department of Defense (DoD) mission needs, are increasingly pressuring manufacturers to implement cost and system performance optimizations without understanding their safety consequences. The impact of this lack of understanding on the commercial world can be seen in recent automotive recalls, delays in the delivery of new airplanes, and airplane accidents.

Although optimizing a cyber-physical system is hard, cost-reduction market pressures and small-form factors (e.g., small, remotely piloted aircraft (RPA)) often demand optimizations. An additional challenge faced by DoD cyber-physical systems is the scheduling of real-time tasks where the amount of computation performed is not fixed but depends on the environment. For instance, the computation time of collision avoidance algorithms in RPA systems often varies in proportion to the objects the RPA finds in its path. This variation is hard to accommodate in traditional real-time scheduling theory, which assumes a fixed, worst-case execution time. Nonetheless, real-time scheduling is essential for RPAs and other autonomous systems that must function effectively in dynamic environments with limited human intervention.

As part of our research, we are investigating a safe double-booking of processing times between safety-critical and non-safety-critical tasks that can tolerate occasional timing failures (deadline misses). This double-booking approach helps reduce the over-allocation of processing resources needed to ensure the timing behavior of safety-critical tasks. Timing assurance is possible in conventional real-time systems by reserving sufficient processing time for tasks to execute for their worst-case execution time. The typical execution time of these tasks, however, is often less than the worst-case execution time, which occurs very rarely in practice. The difference between the worst-case and typical execution time of these tasks is thus considered an over-allocation.

Our approach takes advantage of over-allocation by packing safety-critical and non-safety critical tasks together, letting the latter use the processing time that was over-allocated to the former. This approach essentially "double-books" processing time to both the safety- and non-safety critical tasks. To assure the timing of the safety-critical tasks, however, whenever these tasks need to run for their worst-case execution time, we stop non-critical tasks. We identify this approach as an asymmetric protection scheme since it protects critical tasks from non-critical ones, but does not protect non-critical tasks from critical ones.

An example of where asymmetric protection can be applied is an automotive system. To continue with our earlier air bag example, a car's air bag inflator has a task that continuously checks whether a crash has occurred. Of the 20 milliseconds allotted for airbag deployment, it may only take 5 milliseconds to conduct the check. If a crash has occurred, the airbag will continue to inflate during the remaining 15 milliseconds. If no crash has occurred, however, the remaining 15 milliseconds that the processor was reserved for this task will be available for non-safety-critical tasks, such as fuel-efficiency, acceleration, and active suspension.

The deliverables from our project will include a modified version of the Linux operating system that implements the temporal protection scheme for mixed-criticality systems and the appropriate analysis algorithms to verify the timing behavior of the system. We will also develop optimization algorithms to maximize the utility that users can achieve from different applications available in the modified operating system. We are collaborating with Jeffrey Hansen of The Institute for Complex Engineered Systems (ICES), which is part of Carnegie Mellon University's Carnegie Institute of Technology (CIT); John Lehoczky of CMU's Statistics Department; and Ragunathan (Raj) Rajkumar of the Electrical and Computer Engineering Department at CMU.

Additional Resources:

For additional information, please visit
www.contrib.andrew.cmu.edu/~dionisio/

Software Engineering Institute

SEI Blog