Regression Verification for Real-time Embedded Software Systems

The DoD relies heavily on mission- and safety-critical real-time embedded software systems (RTESs), which play a crucial role in controlling systems ranging from airplanes and cars to infusion pumps and microwaves. Since RTESs are often safety-critical, they must undergo an extensive (and often expensive) certification process before deployment. This costly certification process must be repeated after any significant change to the RTES, such as migrating a single-core RTES to a multi-core platform, significant code refactoring, or performance optimizations, to name a few.

Our initial approach to reducing re-certification effort--described in a previous blog post--focused on the parts of a system whose behavior was affected by changes using a technique called regression verification, which involves deciding the behavioral equivalence of two, closely related programs. This blog posting describes our latest research in this area, specifically our approach to building regression verificationtools and techniques for static analysis of RTESs.

Although there are many types of RTESs, we concentrate on a class of periodic programs, which are concurrent programs that consist of tasks that execute periodically. The tasks are assigned priorities based on their frequency (higher frequency = higher priority). The RTES executes the tasks using a priority-based preemptive scheduler. Each execution of a task is called a job. Thus, from the perspective of the scheduler, a system's execution is a constant periodic stream of jobs of different priorities. In the rest of this post, we use RTES to mean periodic programs.

In the beginning of the project, we assumed that automated verification techniques (such as static analysis and model checking) for single-core RTESs could be adapted for regression verification since these techniques have been used for sequential single-core programs. After conducting an initial survey, however, we found that existing automated verification techniques that apply directly to program source (rather than to a manual abstract model) are not applicable to periodic programs. Our original approach to extend static analysis to regression verification in the setting of multi-core RTES was therefore changed in two ways. First, in phase 1 of our project we developed a new static analysis technique for reasoning about bounded executions of periodic programs. Second, in phase 2 we extended regression verification to multi-threaded programs, of which periodic programs are a restricted subset. The remainder of this blog posting describes these two phases. >

Phase 1: Time-Bounded Verification of Periodic Programs

In the first part of our work, we developed an approach for time-bounded verification of safety properties (user-specified assertions) of periodic programs written in the C programming language. Time-bounded verification is the problem of deciding whether a given program does not violate any user-specified assertions in a given time interval. Time-bounded verification makes sense for RTESs because of their intimate dependence on real time behavior. The inputs to our approach are (1) a periodic program C; (2) a safety property expressed via an assertion A embedded in C; (3) an initial condition Init of C, and (4) a time bound W. The output is either a counter-example trace showing how C violates an assertion A, or a message saying that the program is safe, in the sense that there is no execution that triggers any user-specified assertions.

Our solution to time-bounded verification is based on sequentialization, which involves reducing verification of a current program P to verification of a (non-deterministic) sequential program P'. A key feature of our approach is that P' is linear in the size of P, which means the translation step is not computationally intensive and adds little overhead to the verification effort. The scalability of our approach is therefore mostly driven by the scalability of the underlying analysis engine, and our approach automatically benefits from constant improvements in the verification area.

Our work builds upon previous sequentialization work for context-bounded analysis (CBA) and bounded model checking (BMC). Our approach differs from prior work, however, since it bounds the actual execution time of the program, which is more natural to the designer of an RTES than a bound on the number of context switches (as done in CBA) or a bound on the number of instructions executed (as in BMC). We bound the execution time by translating the input time bound W in our model to a bound on the number of jobs. This translation is a natural consequence of the fact that the tasks are periodic and are therefore activated a finite number of times within W.

We implemented our approach in a tool called REK. REK supports C programs with tasks, priorities, priority ceiling locks, and shared variables. It takes a concurrent periodic program that cannot be analyzed with standard tools for sequential verification and converts it to become analyzable with such tools. Although in principle REK is compatible with any analyzer for bounded (loop- and recursion-free) C programs, in practice we rely on the CBMC tool by Daniel Kroening, which is one of the first and most mature bounded model checkers for C. CBMC can automatically analyze substantial C programs by encoding assertion violation to Boolean satisfiability queries. CBMC is a mature and robust tool that has been extensively applied to many industrial problems.

How REK Works

The analysis problem that REK is designed to solve is to check that a given periodic program is safe under all legal scheduling of tasks. REK solves a time-bounded version of this problem, e.g., whether the program is safe in the first 100ms, 200ms, 300ms, etc., starting from some user-specified initial condition. A time-bounded verification makes sense in the context of periodic programs since their execution can be naturally partitioned by time-intervals. Of course, in practice, unbounded verification would be preferred, so we are working on extending REK in this direction.

We briefly summarize the sequentialization step done by REK. First, we divide a time-bounded execution into execution rounds (or, rounds for short). The execution starts in round 0, a new round starts (and the old one stops) whenever a job of some task finishes. An execution with X jobs therefore requires X execution rounds. The sequentialization step simulates execution of each round independently and then combines them (using non-deterministic choice) into a single legal execution. Further details of the construction are available in our FMCAD 2011 paper referenced below.

In addition to the basic sequentialization described above, REK is extended with the following features to achieve scalability to realistic programs:

Partial order reduction is a set of techniques used in model checking to reduce the number of interleavings that must to be explored in a concurrent system. For example, if there are two independent actions a and b, then only one of the two executions 'a followed by b' or 'b followed by a' must be explored since they both lead to the same destination state. Although there are many approaches for partial order reduction in explicit state model checking (as opposed to symbolic model checking used in this work), extending them to symbolic verification is an area of active research. In REK, we developed a new partial order reduction technique that restricts explored executions only to those in which a read statement is preempted by a write statement to the same variable, or a write is preempted by a read or a write. This reduction eliminates many unnecessary interleavings and cuts the search space significantly. Our experiments show that the reduction is quite effective in practice.
A limitation to our approach is that it does not keep track of the actual execution time of each instruction, each job, and each task. As such, it is an over-approximation since it explores more executions than actually possible and can produce a "false positive" by producing a counter-example trace that is not possible on a given hardware architecture due to timing restrictions. To reduce the number of false positives, we further constrain our sequentialization by the information that can be inferred from schedulability analysis. Thus, if a periodic program is schedulable, it satisfies the rate monotonic analysis (RMA) equations. Those equations can be used to compute an upper bound on the number of times any given low priority job can be preempted by any given high priority job. We call this the preemption bound, which REK uses to further reduce the number of interleavings by keeping track how many times one task preempts another, and ensuring that this value never exceeds the preemption bound for the jobs of that task.
To deal with practical periodic programs, REK provides support for two types of commonly used lock primitives. In particular, it supports preemption locks (preemptions are disabled when the lock is held) and priority ceiling locks (preemption by any task with lower priority than the lock is disabled when the lock is held). We are extending REK to support the third common type of locks, priority-inheritance locks (regular blocking locks, but the priority of a low-priority task that holds a lock l is increased if a high-priority task is waiting for l).

As part of our research, we created a model problem using the NXTway-GS, which is a two-wheeled, self-balancing robot that responds to Bluetooth commands. The robot uses a gyroscope to balance itself upright by applying power to left and right wheels. It also uses a sonar sensor so that when it comes to an obstacle, like a wall or ditch, it can back up. We have used REK to verify and fix several communication consistency properties between the tasks of the robot. More information on the use of REK for the NXTway-GS is available at http://www.andrew.cmu.edu/~arieg/Rek.

Phase 2: Regression Verification for Multi-threaded Programs

In the second phase of our work, we examined regression verification for multi-threaded programs. We believe that that once we have regression verification for multi-threaded programs, we can adapt it to periodic programs as well.

Every instance of regression verification is based on some underlying notion of equivalence. The equivalence notion for single-threaded software is called partial equivalence: two functions are partially equivalent if they produce the same output for the same input. A multi-threaded program, conversely, is not partially equivalent to itself by the above definition since the same input can lead to different outputs due to scheduling choices. Our first challenge therefore involved creating a notion of equivalence for multi-threaded software.

Our second challenge was to come up with the right notion of decomposition to establish equivalence of programs from equivalence of their functions. Equivalence of sequential programs is done using Intput/Output equivalence. Two sequential programs are equivalent if it is possible to show that their corresponding functions have the same Input/Output behavior (produce the same output given the same input). In the case of multi-threaded programs, however, functions from different threads of a single program affect one another, making simple decomposition at the level of functions much harder because it must take interference from other threads into account.

To check whether two multi-threaded programs are partially equivalent (P = P') we use a proof rule consisting of a set of premises and a conclusion. Each premise establishes the partial equivalence of a pair of functions f and f' from P and P', respectively. A premise is established by verifying a single-threaded program.

As part of this work, we developed two separate proof rules:

The first rule attempts to show equivalence of two programs by showing that their corresponding functions are Input/Output equivalent (produce the same output for a given input) under arbitrary interference, where "interference" means that the value of shared variables can change between execution of instructions of a thread. This rule is "strong" (not widely applicable on many equivalent programs) because in practice the functions must be equivalent only in the context of the given program and not under arbitrary interference.
The second rule improves on the first rule by attempting to show that two programs are equivalent by restricting interference to what is consistent with the other functions in the program. For example, if there is no other function in a program that can affect a global variable 'x', then no interference that modifies 'x' is considered. This rule is "weaker" (more widely applicable) than the first one, but is computationally harder to automate.

In Conclusion

The ability to statically reason about correctness of periodic programs and the ability to perform regression verification adds the following key capabilities to an RTES developer's toolbox:

ability to check prior to deployment that the program does not violate its assertions,
ability to check that top-level application programming interfaces (APIs) are not affected by low-level refactoring and/or performance optimizations,
ability to check that new APIs are backward compatible with old APIs, and
ability to perform impact analysis to determine which function may possibly be affected by a given source code change and which unit tests must be repeated.

We believe these capabilities can lower the cost of developing RTESs, while increasing their reliability and trustworthiness.

Software Engineering Institute

SEI Blog