Process Concerns When Navigating the Transition from Sustainment to Engineering Software-Reliant Systems

The Defense Innovation Board (DIB), a group of leading thinkers in the area of national security that advises the Department of Defense (DoD), recently gathered to discuss the issue of U.S. dominance in software. The resulting report noted that in this age of artificial intelligence and machine learning, with military systems that are increasingly networked and automated, the DoD’s ability to maintain superiority will be “directly linked to our ability to field and maintain software that is better, smarter, and more capable than our adversaries’ software.”

In our work with federal programs, we have observed that organic software sustainment organizations are increasingly tasked with engineering and developing the software capabilities of acquisition systems. In other words, organic software sustainment is expanding beyond its traditional purview of software maintenance and repair into software engineering and development. Our initial post gave an overview of the issues that the DoD should address to make this transition successful. This post explores process concerns and provides recommendations for supporting teams transitioning from software sustainment to engineering.

The cost of externally contracted software acquisition, as well as costs associated with the proprietary nature of end-state software without clear government-use rights, is driving the DoD to develop organic software engineering capabilities to help achieve the objectives outlined in the DIB report cited above. These objectives, and the software engineering expertise required to meet them, have been outside the scope of the DoD’s traditional software sustainment organizations. To avoid the concerns of externally contracted software, the DoD increasingly supports the creation of government-managed labs and centers, as well as the transformation of organic sustainment organizations into engineering organizations that expand traditional government software capabilities.

The issues explored in this series of blog posts stem from the SEI’s experience as a federally funded research and development center (FFRDC). Our mission focuses on creating and transitioning software engineering knowledge to our stakeholders so that they are better equipped to acquire, develop, operate, and sustain software for strategic national advantage. We are keen to help DoD software sustainment organizations successfully navigate the transition to their new roles.

The Different Concerns of Software Engineering, Architecture, and Design

Software sustainment organizations that take on greater engineering responsibilities must address a range of new concerns. In particular, software engineering teams own the architecture and design for the capabilities they develop, which in turn drives their implementation, modification, and refactoring considerations. When sustainment groups take on software engineering tasks, they must shift to a creation-process mindset, especially with respect to (1) new modifications and capabilities and (2) major changes via refactoring.

For example, the recognition of tradeoffs between various software patterns and design decisions can become drivers for implementation, modification, and refactoring. Successfully navigating these tradeoffs requires developers to make well-informed choices from the beginning. Failing to navigate these tradeoffs increases the likelihood of costly retrofits required to enhance or optimize suboptimal architectures and designs later in the lifecycle. Creating effective solutions and prioritizing architecture and design considerations makes it easier to implement enhancements.

Organic sustainment teams often lack access to the architectural decisions and documentation created by external contractors during system inception. These factors are beyond the control of a sustainment team, often because such documentation was never created or was not contractually mandated during the handoff from external contractors. The time and effort required to reverse engineer this information from the source code underscores the importance of software engineers creating and updating such documents throughout the lifecycle of a system. Workarounds to these challenges are outlined below and covered in future posts.

Without the benefit of insight into the early decisions and documentation that shaped a system’s architecture and design, sustainment organizations are often limited to making modifications (e.g., bug fixes) within a small scope of the overall system. In particular, software sustainment teams often lack institutional knowledge on how their localized changes may affect the broader qualities of a system. For example, without a firm understanding of the broader context in which they are made, small corrective changes during software sustainment can yield unanticipated impacts on performance and may introduce new defects.

In contrast, software engineering teams developing new software in greenfield settings face fewer constraints stemming from decisions made by previous teams. This freedom enables systematic reuse techniques from the outset. In this setting, modern commercial off-the-shelf (COTS) frameworks and platforms help reduce time to deployment by avoiding the reproduction of capabilities available via existing and trusted components. While sustainment teams can hypothetically refactor their legacy software to apply systematic reuse and leverage COTS, in practice it may be prohibitively expensive and inconsistent with the “just works” mindset, as discussed in the next section.

Process Concerns in Software Engineering

Context. An organization uses lightweight development processes, for example Agile and other iterative methods, for software development.

Problem. Misinterpretations of these processes can create ineffective software development efforts and negatively affect a team’s overall morale.

A common misinterpretation of Agile’s minimum viable product (MVP) is that MVP aligns with a “just works” implementation. Our experience indicates this misinterpretation is amplified in sustainment teams that are transitioning to engineering. In sustainment organizations, many developers often focus on the singular task of bug fixing, which is a classic example of a targeted improvement with outsized benefit.

Aligning the Agile MVP concept with such examples leads to less rigorous implementations that lack optimization, simplification, and elegance. For example, a system that is stitched together with opportunistic reuse of stack overflow can result in a complicated implementation that does not include a holistic overall analysis, and the system cannot be optimized, simplified, or ultimately reviewed or refined. In this case, Agile’s MVP is often used to justify keeping sub-optimal “just-works” designs and implementations.

Though design and architecture issues are paramount to engineering groups, they are not priorities in software maintenance settings, so sustainment groups may view these concerns with suspicion. A sustainment group transitioning to engineering will struggle in these areas. These struggles, in turn, often manifest in designs and implementations that lack a solid foundation and may fail to meet initial requirements. Over the product lifetime, these issues can compound, thereby restricting future growth and enhancement due to vexing, poorly understood issues.

One additional process concern is that in both software sustainment and software engineering organizations, those who are not well versed in Agile can use the methodology to justify a culture that views architecture and design analysis not only as suspicious, but not even real work. Such a culture views the programmers who write the code and fix bugs as the “doers,” whereas the architects are the “talkers.”

In the absence of good architecture and design, Agile velocity suffers. The feedback loop makes system design more efficient through adherence to quality attributes by prompting the following questions:

How do I make this easier to use?
How do I make this more maintainable?
How do I make this more comprehensible, more reusable?

Without this type of feedback loop, any development effort risks taking on a multitude of maintenance tails and, ultimately, costing taxpayers more money and slowing the pace of sustainment and evolution.

Agile can also be used to justify the idea that architecture and design are not important, a misconception that can be (mis)applied to every Agile methodology. Unfortunately, this idea does not account for multi-billion-dollar capabilities with hundreds of contributors to sustainment efforts that stretch over a period of decades.

Solutions. Introducing design and architecture issues as a priority in the technical debt backlog is a must for any successful engineering effort. Encouraging sustainment personnel to think about architecture and design takes time. Mentoring and architecture and design reviews should include developers at all experience levels.

Actively encouraging simplification or reuse of existing and working code (aka refactoring) is key to including architecture and design work into the iterative agile workflow. Prioritization of refactoring and refinement processes creates a team focused on more than functionality and introduces new engineers to maintainability and simplification concerns. For example, on a recent project, a component’s external interface exposed the implementation of a hash map, via data members. The only operation that any user of this interface needed was a lookup operation. An architectural analysis would find this concern and propose a simplified abstraction. This work would be added and prioritized into the backlog.

Inexperienced developers often think that since the current solution “just works” it is more expedient to copy and paste the code that uses the internal data members of the component. To them the refactoring and clean-up effort seems unnecessary because existing functionality is emphasized over long-term maintainability. In this case, because “it just works,” it never receives attention, and the hacky interface makes long-term maintenance worse.

Targeted architectural-based refactorizations and simplifications are akin to the bug-fixing efforts that sustainment teams already do. A sustainment team’s core abilities strongly align with architectural refactoring and simplification.

Expanding Testing and Verification

Context. It is important to understand what is being tested. Software sustainment largely exists to fix known defects and testing focuses on the correctness of a solution. Evaluating other quality attributes is often left to larger integration testing.

Problem. A narrow focus on defect correction means that software-sustainment teams transitioning to software engineering may need broader awareness of the quality attributes and system requirements. Without historic exposure to broader, system-wide concerns, misunderstandings often arise about what is really being tested or should be tested.

We have worked with teams that test network performance and protocol latency using a single system (i.e., a virtual machine) that could be valid for correctness, but less representative of deployed performance.

Solution. Multiple measurement approaches exist, so it is important to understand the goal of test scenarios. Piecing together accurate and approximately representative deployments requires an understanding of the software and hardware architectures, and this knowledge is built over time.

For example, to create accurate statements of network performance or latency, engineers should create representative test deployment scenarios with approximate network hardware (switches, NICs, cables, etc.).

The ability to decompose a system for testing is a necessity. Using the decomposed subsystems in unit, functional, and simulated system testing scenarios creates layers of trust in the correctness of an overall system, though any simulated test scenario creates important tradeoffs.

In Large-Scale C++ Volume I: Process and Architecture, John Lakos wrote the following:

Cyclic physical dependency among components, or units of release (e.g., packages and package groups) is not allowed, as no member of such a cycle could ever be tested independently of the rest. This rule, however, encompasses far subtler forms of intractability.

In other words, if the independent component of a system cannot be verified, there is no way to verify the combination of components either. If components in a system cannot be tested independently this indicates a lack of decomposition and the ball of mud pattern, which is virtually untestable and very hard to maintain.

Strategically Focus New Development

Context. Software development is a rapidly moving area, but many of the basic building blocks have long existed in the form of COTS or open-source solutions.

Common, basic operations and functionality that all software efforts need is almost certainly already available for the hardware and software platforms used. Any deployed software development efforts likely need logging, data element storage and manipulation, atomicity, and parallelism.

Problem. New software engineers often reinvent the wheel repeatedly. Many software efforts re-invent these primitives either because the engineers are unfamiliar with them or because they have a “not invented here” mindset.

Some unique system or processing domains (e.g., classification) can present challenges to using existing COTS software components for these common situations. In our work, we encounter custom implementations of common data structures and algorithms (e.g., list and hash implementations, logging, network I/O, mutexes) in almost every software effort. Custom implementations increase the effort and cost to maintain product and introduce non-standard paradigms that increase the barrier of entry to new developers.

Solutions. While implementing basic data structures and algorithms can be fun, useful, and educational, deploying such reimplementations into production is often detrimental to the overall system quality. It is unlikely that any custom implementation will be better than existing, long-lived, and well-maintained COTS or open-source options. All modern languages and toolchains have well-optimized and tested implementations of common algorithms, data structures, and system primitives for parallelism, file system access, locking, and so on.

When it comes to work, humans are motivated to make things simpler. It follows that engineering teams are motivated to focus on reusable software development because it

reduces total ownership costs
increases return on investment
eliminates repetition, i.e., the don’t repeat yourself (DRY) principle

Software engineers should be able to recognize the violations outlined above, such as opportunistic reuse (e.g., copying and pasting). This type of awareness is not always prevalent among software-sustainment groups that are suddenly thrust into an engineering role.

Wrapping Up and Looking Ahead

Our intent with these posts is not to place blame on any one group, but to urge DoD leaders to address the underlying incentives and long-held foundations that create disadvantages for sustainment groups when transitioning to engineering, which is our future reality. In future posts, we will focus on technical issues and offer recommendations for transitioning teams.

SEI Blog