Challenges to Assuring Large-Scale Systems

In response to global events, national defense efforts have shifted from defeating terrorism to accelerating innovation, with a priority of delivering capability at speed and at scale. Defense program offices are consequently facing increased pressure to innovate using commercial technologies to produce new prototypes on a tighter timeline. To support these efforts, the SEI is doing research that includes new paradigms to support rapid and continuous assurance of evolving systems.

In this blog post, which is adapted from our recently published technical report, we outline a model problem for assurance of large-scale systems and six challenges that need to be addressed to assure systems at the speed DoD needs now.

Verification and Validation in Large-Scale Assurance

SEI researchers are focusing on approaches to large-scale assurance with the goal of reducing the time and effort required to (re-)assure large systems. We consider an assured system to be a system for which suitable evidence has been gathered from activities related to verification and validation—and for which sufficient arguments have been made to have confidence that the software system is ready for operational use and will work as intended. This notion of system assurance extends beyond security to encompass multiple architecturally significant concerns including performance, modifiability, safety, reliability.

The increasing scale of systems and their resulting complexity make it difficult to combine capabilities from separately developed systems or subsystems, especially when there is a need to incorporate innovations and subsequently re-assure systems with speed and confidence. This difficulty is driven, in part, by a system’s scale. Scale, in this context, is not just about the “size” of a system, by whatever measure, but also about the complexity of a system’s structure and interactions.

These interactions among system elements may not have been exposed or anticipated in contexts where subsystems are developed or even where the full system has been executed. They may appear only in new contexts, including new physical and computational environments, interactions with new subsystems, or changes to existing integrated subsystems.

A Model Problem for Large-Scale Assurance

In our research to address these challenges, we present a model problem and scenario that reflects the challenges that must be addressed in large-scale assurance. When considering design issues, our SEI colleague Scott Hissam stated, “a model problem is a reduction of a design issue to its simplest form from which one or more model solutions can be investigated.” The model problem we present in this report can be used to drive research for solutions to assurance issues and to demonstrate those solutions.

Our model problem uses a scenario that describes an unmanned aerial vehicle (UAV) that must execute a humanitarian mission autonomously. In this mission, the UAV is to fly to a specific location and drop life-saving supplies to people who are stranded and unreachable by land, for example after a natural disaster has altered the terrain and isolated the inhabitants.

The goal of the model problem is to give researchers context to develop methods and approaches to address different issues that are key to reducing the effort and cost of (re-)assuring large-scale systems.

In this scenario, the agency in charge of handling emergency response must provide scarce life-saving supplies and deliver them only if certain conditions are met; this approach ensures the supplies are delivered when they are truly needed.

More specifically, these supplies must be delivered at specific locations within specified time windows. The emergency response agency has acquired new UAVs that can deliver the needed supplies autonomously. These UAVs can be invaluable since they can take off, fly to a programmed destination, and drop supplies before returning to the initial launch location.

The UAV vendor affirms that its UAVs can execute these types of missions while meeting the associated stringent requirements. However, there may be unforeseen interactions that the vendor may not have discovered during testing that may occur among the subcontracted parts that were integrated into the UAV. For these reasons, the emergency response agency should require additional assurance from the vendor that the UAVs can execute this mission and its requirements.

Assurance Challenges that Need to Be Addressed

The challenge of assuring systems in these circumstances stems from the inability to automatically integrate the complex interacting assurance techniques from a system’s multiple interacting subsystems. In the context of our case study, interactions that can be challenging to model include those related to control stability, timing, security, logical correctness. Moreover,the lack of awareness of assurance interdependencies and the lack of effective reuse of prior assurance results leads to considerable re-assurance costs. These costs are due to the need for extensive simulations and tests to discover the interactions among multiple subsystems, especially cyber-physical systems, and even then, some of those interactions may not be uncovered.

It’s important to reiterate that while these assurance challenges stem from the model problem they are not specific to the model problem. While assurance of safety-critical systems is important, these issues would apply to any large-scale system.

We have identified six key assurance issues:

Multiple assurance types: Different kinds of assurance analyses and results (e.g., response time analysis, temporal logic verification, test results) are needed and must be combined into a single assurance argument.
Inconsistent analysis assumptions: Each analysis makes different assumptions, which must be consistently satisfied across analyses.
Subsystem assurance variation: Different subsystems can be developed by different organizations, which provide assurance results for the subsystem that must be reconciled.
Varying analytical strength: The different assurance analyses and results used in the assurance argument may offer differing levels of confidence in their conclusions—from the simple testing of a few cases to exhaustive model checking. Therefore, conclusions about claims supported by the assurance argument must consider these different confidence levels.
Incremental arguments: It may not be feasible or desirable to build a complete assurance argument before some system assurance results can be provided. Therefore, it should be possible to build the assurance argument incrementally, especially when done in coordination with systems design and implementation
Assurance results reuse: The system is likely to evolve due to changes or upgrades in individual subsystems. It should be possible to retain and reuse assurance models and results when only part of the system changes—recognizing that interactions may require revising some of the analyses.

Future Work in Assuring Large-Scale Systems

We are currently developing the theoretical and technical foundations to address these challenges. Our approach includes an artifact called argument architecture where the results of the different analyses are captured in a way that allows for composition and reasoning about how their composition satisfies required system properties.

Software Engineering Institute

SEI Blog