Toward Technical Reference Frameworks to Support Large-Scale Systems of Systems

Contemporary software-reliant systems of systems are increasingly composed of integrated capabilities, deployed across an array of interoperable systems. Integration enables the U.S. Department of Defense (DoD) to benefit from significant advances in computing platforms, networking, and infrastructure and application software developed by the worldwide commercial marketplace for information technologies and methods. In this blog post, we present strategies for creating architectures for large-scale, complex, and interoperable systems of systems that are composed of functions covering a broad range of requirements from deeply embedded to cloud-enabled.

An Overview of Technical Reference Frameworks

We ground our discussion with examples of software-reliant system-of-system capabilities in the context of modern aircraft carriers. We will juxtapose system types that require closed-loop control , where the right answer delivered too late becomes the wrong answer, against more highly integrated systems that are not as safety-critical, but that contain mission-essential functions. On the one hand, some real-time capabilities (e.g., a close-in weapon system) require the highest levels of predictability, security, reliability, and safety. On the other hand, some analytic capabilities need to manage large volumes of data but can afford to respond between sips of coffee (e.g., management of the anti-submarine warfare tactical picture).

In this context of mixed criticality systems-of-systems it is necessary to address diversity in safety needs for alternate use cases (e.g., personnel safety vs. weapon safety) that have different requirements and stakeholder demands where processing needs may be more diverse. These heterogeneous environments have functions that coexist with different levels of criticality, different levels of control, and different size scales. They can be addressed most effectively and holistically by employing a limited set of technical reference frameworks (TRFs), which consist of computing and infrastructure environments that support modular components fitting a pattern of temporal needs aligned to scale with the processing needs for reusable domain architectures.

A TRF should address the timing and scale conditions of the mission capabilities it supports. For example, computer networks must be architected to deal with varying levels of timing, flow, security, and trust as they interact with more nodes, gather information from network-based sensors, and employ computer-based decision-making. System stakeholders still want to realize the benefits of demand-side economies of scale, such as integration, virtual connections, and interoperability, without accepting undue risk. Other parts of a system of systems require direct human interaction to make decisions. A TRF appropriate for the safety-critical portions of an aircraft carrier (e.g., weapons elevators or the aforementioned defensive anti-missile close-in weapon system) therefore may not be appropriate for functions that address widely different levels of interoperability and human interaction.

Applying TRFs to Meet the Requirements of Mixed-Criticality Systems of Systems

Figure 1: Mixed-Criticality Systems of Systems Through a Range of Technology Domains

Figure 1 shows that in complex systems of systems, functionality coexists with a range of technology domains, with different levels of criticality, different levels of latency, and different scale factors with respect to the number of hardware and software components. As shown on the left side of Figure 1, time criticality is important for some systems (i.e., the right answer late is the wrong answer, and being early sometimes isn’t good either), but scale is small (dozens of nodes as opposed to thousands). In contrast, as shown on the right side of Figure 1, the relative scale of deployment complexity (e.g., number of connected nodes, number of software components, and volume of data processed) is more central to that part of the architecture than criticality. TRFs should be tuned and optimized for the different technology domains in which these systems operate.

For example, closed-loop control functions often have safety-critical QoS requirements, with latency bounds less than 1 millisecond, but relatively small scale, i.e., possessing relatively few components. Conversely, command and decision functions often have mission-critical QoS requirements, with latency bounds between 1 to 500 milliseconds, but with a larger number of components. Moreover, data analysis and infotainment systems often have best-effort QoS requirements, with acceptable latency bounds greater than 500 milliseconds, but a much larger number of components involved. Each technology domain can therefore be represented most effectively by different—but interrelated—TRFs.

For example, Figure 2 illustrates this principle of interrelated TRFs in the context of our earlier example of a modern aircraft carrier. Some parts of the aircraft carrier shown in Figure 2 are embedded and require closed-loop control (TRF1) operating deterministically in microsecond or millisecond time frames. Other parts require command and decision, or command and launch (TRF2). Command-and-decision actions are mission-critical operations that occur in human time—minutes or tens of minutes—as opposed to the tighter time frame of closed-loop control. The broader spectrum view (TRF3) represents “best-effort” capabilities that occur external to the aircraft carrier (such as data analysis performed in cloud-based data centers that afford a broader view of situational analysis) or that aren’t essential to the aircraft carrier’s primary combat role (such as crew infotainment via streaming media).

Figure 2: An Aircraft Carrier as a Mixed-Criticality System of Systems

Portions of the aircraft carrier system of systems shown in Figure 2 highlight cyber-physical systems like a close-in weapon system. Such systems need to track targets and align weapons in a tight fire-control loop and are highly safety-critical. The command and decision parts of the carrier are mission-critical systems that provide information and decision-support capabilities where people are evaluating data from sensors, satellites, over-the-horizon aircraft, other ships in the taskforce, etc., to assess threats and prosecute the mission. These products are vital to the ship’s mission and process much more information, but don’t have the same dependency on precisely timed actions found in the ship defense systems.

Finally, there are parts of an aircraft carrier that control such non-critical functions as administrative support or crew entertainment (infotainment). Although these systems may process large amounts of data (e.g., bandwidth needs like the streaming movies) they are “best-effort” capabilities that are neither safety- nor mission-critical. Notably all three of these areas require relentless rigor in cybersecurity, though each has very different attack surfaces, related vulnerabilities, and processes for delivering protections.

An Architecture of Architectures to Coordinate Interrelated TRFs

To support the types of systems of systems shown in Figures 1 and 2 requires an architecture of architectures that enables all these TRF capabilities to coexist, evolve, and thrive. Each TRF must in turn take advantage of advances in technology, methods, and tools. Such an architecture of architectures must apply the right technology at the right place and at the right time in accordance with the requirements for the different parts of the system.

For example, if a development team were to build an entire system of systems based only on the safety-critical parts, it would take too long to build, be prohibitively expensive, would not provide all the functionality needed in a reliable way, and be a high-friction path to employ the latest technology. Likewise, if such a team were to build an aircraft carrier’s close-in weapons system using the same technology that is used to build a typical commercial website, the system would be less expensive to build, but would be unlikely to meet its safety and timing requirements and would probably fail to achieve its mission.

Figure 3 shows that in a complex system of systems, one size does not fit all, but instead different types of requirements must be addressed by different TRFs. For example, different microprocessors (Intel vs. ARM), programming languages (Python vs. Java vs. C/C++), operating systems (Windows vs. Linux vs. VxWorks), middleware (DDS vs. Spring), databases (NoSQL vs. SQL), and computer networks (VME vs. TCP/IP) are appropriate for different TRFs. Moreover, not only is the scale different across these different technical domains, but also the way these components are developed, fielded, and nurtured will change. TRFs must therefore be acquired and managed differently over time yet must often work together seamlessly both within and across TRF boundaries.

Figure 3: Coexisting Systems at Different Time Scales in a System of Systems

System-of-Systems Interoperability

The DoD delivers highly interoperable capabilities. Having the right technologies, methods, and tools for a given environment is critical to ensure that a system of systems meets its functional and non-functional requirements. As the DoD builds, secures, and operates a range of complex system-of-systems deployments, collaborating on a combination of TRFs will help the contractors and the government fluidly deliver and manage military advantage to fit the situation. Our next blog post will explore how to map TRFs to the different pathways that comprise the DoD’s Adaptive Acquisition Framework.

Software Engineering Institute

SEI Blog