search menu icon-carat-right cmu-wordmark

10 Recommended Practices for Achieving Agile at Scale

Headshot of Ipek Ozkaya. Robert Nord

This post is the first in a two-part series highlighting 10 recommended practices for achieving agile at scale.

Software and acquisition professionals often have questions about recommended practices related to modern software development methods, techniques, and tools, such as how to apply agile methods in government acquisition frameworks, systematic verification and validation of safety-critical systems, and operational risk management. In the Department of Defense (DoD), these techniques are just a few of the options available to face the myriad challenges in producing large, secure software-reliant systems on schedule and within budget.

In an effort to offer our assessment of recommended techniques in these areas, SEI built upon an existing collaborative online environment known as SPRUCE (Systems and Software Producibility Collaboration Environment), hosted on the Cyber Security & Information Systems Information Analysis Center (CSIAC) website. From June 2013 to June 2014, the SEI assembled guidance on a variety of topics based on relevance, maturity of the practices described, and the timeliness with respect to current events. For example, shortly after the Target security breach of late 2013, we selected Managing Operational Resilience as a topic.

Ultimately, SEI curated recommended practices on five software topics: Agile at Scale, Safety-Critical Systems, Monitoring Software-Intensive System Acquisition Programs, Managing Intellectual Property in the Acquisition of Software-Intensive Systems, and Managing Operational Resilience. In addition to a recently published paper on SEI efforts and individual posts on the SPRUCE site, these recommended practices will be published in a series of posts on the SEI blog. This post, the first in a two-part series by Ipek Ozkaya and Robert Nord, presents challenges to achieving Agile at Scale as well as the first five of the 10 technical best practices detailed in the SPRUCE post. The second post in this series will present the remaining five best practices, as well as three recommendations for making the best use of the practices to achieve Agile at Scale.

Why is Agile at Scale Challenging?

Agile practices, derived from a set of foundational principles, have been applied successfully for well over a decade and have enjoyed broad adoption in the commercial sector, with the net result that development teams have gotten better at building software. Reasons for these improvements include increased visibility into a project and the emerging product, increased responsibility of development teams, the ability for customers and end users to interact early with executable code, and the direct engagement of the customer or product owner in the project to provide a greater sense of shared responsibility.

Business and mission goals, however, are larger than a single development team. Applying Agile at Scale, in particular in DoD-scale environments, therefore requires answering several questions in these dimensions:

1. Team size. What happens when Agile practices are used in a 100-person (or larger) development team? What happens when the development team needs to interact with the rest of the business, such as quality assurance, system integration, project management, and marketing, to get input into product development and collaborate on the end-to-end delivery of the product? Scrum and Agile methods, such as extreme programming (XP), are typically used by small teams of at most 7-to-10 people. Larger teams require orchestration of both multiple (sub)teams and cross-functional roles beyond development. Organizations have recently been investigating approaches, such as Scaled Agile Framework, to better manage the additional coordination issues associated with increased team size.

2. Complexity. Large-scale systems are often large in scope relative to the number of features, the amount of new technology being introduced, the number of independent systems being integrated, the number and types of users to accommodate, and the number of external systems with which the system communicates. Does the system have stringent quality attributes needs, such as stringent real-time, high-reliability, and security requirements? Are there multiple external stakeholders and interfaces? Typically, such systems must go through rigorous verification and validation (V&V), which complicate the frequent deployment practices used in Agile development.

3. Duration. How long will the system be in development? How long in operations and sustainment? Larger systems need to be in development and operation for a longer period of time than products to which agile development is typically applied, requiring attention to future changes, possible redesigns, as well as maintaining several delivered versions. Answers to these questions affect the choice of quality attributes supporting system maintenance and evolution goals that are key to system success over the long term.

Best Practices for Achieving Agile at Scale

Every organization is different; judgment is required to implement these practices in a way that benefits your organization. In particular, be mindful of your mission, goals, existing processes, and culture. All practices have limitations--there is no "one size fits all." To gain the most benefit, you need to evaluate each practice for its appropriateness and decide how to adapt it, striving for an implementation in which the practices reinforce each other. Also, consider additional best practice collections (such as the one from the Government Accountability Office (GAO) referenced at the end of this webpage). Monitor your adoption and use of these practices and adjust as appropriate.

1. Make team coordination top priority.

Scrum is the most common Agile project management method used today, and primarily involves team management practices. In its simplest instantiation, a Scrum development environment consists of a single Scrum team with the skills, authority, and knowledge required to specifyrequirements, architect, design, code, and test the system. As systems grow in size and complexity, the single team mode may no longer meet development demands. If a project has already decided to use a Scrum-like project-management technique, the Scrum approach can be extended to managing multiple teams with a "Scrum of Scrums," a special coordination team whose role is to (1) define what information will flow between and among development teams (addressing inter-team dependencies and communication) and (2) identify, analyze, and resolve coordination issues and risks that have potentially broader consequences (e.g., for the project as a whole). A Scrum of Scrums typically consists of members from each team chosen to address end-to-end functionality or cross-cutting concerns such as user interface design, architecture, integration testing, and deployment. Creating a special team responsible for inter-team coordination helps ensure that the right information, including measurements, issues, and risks, is communicated between and among teams. Care needs to be taken, however, when the Scrum of Scrums team itself gets large to not overwhelm the team. This scaling can be accomplished by organizing teams--and the Scrum of Scrums team itself--along feature and service affinities. We further discuss this approach to organizing teams in our feature-based development and system decomposition practice. Such orchestration is essential to managing larger teams to success, including Agile teams.

2. Use an architectural runway to manage technical complexity.

Stringent safety or mission-critical requirements increase technical complexity and risk. Technical complexity arises when the work takes longer than a single iteration or release cycle and cannot be easily partitioned and allocated to different technical competencies (or teams) to independently and concurrently develop their part of a solution. Successful approaches to managing technical complexity include having the most-urgent system or software architecture features well defined early (or even pre-defined at the organizational level, e.g., as infrastructure platforms or software product lines).

The Agile term for such pre-staging of architectural features that can be leveraged by development teams is "architectural runway." The architectural runway has the goal of providing the degree of stability required to support future iterations of development. This stability is particularly important to the successful operation of multiple teams. A system or software architect decides which architectural features must be developed first by identifying the quality attribute requirements that are architecturally significant for the system. By initially defining (and continuously extending) the architectural runway, development teams are able to iteratively develop customer-desired features that use that runway and benefit from the quality attributes they confer (e.g., security and dependability).

Having a defined architectural runway helps uncover technical risks earlier in the lifecycle, thereby helping to manage system complexity (and avoiding surprises during the integration phase). Uncovering quality attribute concerns, such as security, performance, or availability with the underlying architectural late in the lifecycle--that is, after several iterations have passed--often yields significant rework and schedule delay. Delivering functionality is more predictable when the infrastructure for the new features is in place, so it is important to maintain a continual focus on the architecturally significant requirements and estimation of when the development teams will depend on having code that implements an architectural solution.

3. Align feature-based development and system decomposition.

A common approach in Agile teams is to implement a feature (or user story) in all the components of the system. This approach gives the team the ability to focus on something that has stakeholder value. The team controls every piece of implementation for that feature and therefore they need not wait until someone else outside the team has finished some required work. We call this approach "vertical alignment" because every component of the system required for realizing the feature is implemented only to the degree required by the team.

System decomposition could also be horizontal, however, based on the architectural needs of the system. This approach focuses on common services and variability mechanisms that promote reuse.

The goal of creating a feature-based development and system decomposition approach is to provide flexibility in aligning teams horizontally, vertically, or in combination, while minimizing coupling to ensure progress. Although organizations create products in very different domains (ranging from embedded systems to enterprise systems) similar architecture patterns and strategies emerge when a need to balance rapid progress and agile stability is desired. The teams create a platform containing commonly used services and development environments either as frameworks or platform plug-ins to enable fast feature-based development.

4. Use quality-attribute scenarios to clarify architecturally significant requirements.

Scrum emphasizes customer-facing requirements--features that end users dwell on--and indeed these are important to success. But when the focus on end-user functionality becomes exclusive, the underlying architecturally significant requirements can go unnoticed.

Superior practice is to elicit, document, communicate, and validate underlying quality attribute scenarios during development of the architectural runway. This approach becomes even more important at scale when projects often have significant longevity and sustainability needs. Early in the project, evaluate the quality attribute scenarios to determine which architecturally significant requirements should be addressed in early development increments (see architectural runway practice above) or whether strategic shortcuts can be taken to deliver end-user capability more quickly.

For example, will the system really have to scale up to a million users immediately, or is this actually a trial product? There are different considerations depending on the domain.For example, IT systems use existing frameworks, so understanding the quality attribute scenarios can help developers understand which architecturally significant requirements might already be addressed adequately within existing frameworks (including open-source systems) or existing legacy systems that can be leveraged during software development. Similarly, such systems must address changing requirements in security and deployment environments, which necessitates architecturally significant requirements be given top priority when dealing with scale.

5. Use test-driven development for early and continuous focus on verification.

This practice can be summarized as "write your test before you write the system." When there is an exclusive focus on "sunny-day" scenarios (a typical developer's mindset), the project becomes overly reliant on extensive testing at the end of the project to identify overlooked scenarios and interactions. Therefore, be sure to focus on rainy-day scenarios (e.g., consider different system failure modes), as well as sunny-day scenarios. The practice of writing tests first, especially at the business or system level (which is known as acceptance test-driven development) reinforces the other practices that identify the more challenging aspects and properties of the system, especially quality attributes and architectural concerns (see architectural runway and quality-attribute scenarios practices above).

Looking Ahead

Technology transition is a key part of the SEI's mission and a guiding principle in our role as a federally funded research and development center. We welcome your comments and suggestions on further refining these recommended practices.

The next post in this series will present the five remaining technical practices, as well as strategies for how an organization can prepare for and achieve effective results using these best practices.

Below is a listing of selected resources to help you learn more. We have also added links to various sources to help amplify a point. Please be mindful that such sources may occasionally include material that might differ from some of the recommendations in the article above and the references below.

For more information about Agile at Scale, please see:

Stephany Bellomo, Philippe Kruchten, Robert L. Nord, Ipek Ozkaya. How to Agilely Architect an Agile Architecture, Cutter IT Journal, February 2014.

Stephany Bellomo, Robert L. Nord, Ipek Ozkaya: A Study of Enabling Factors for Rapid Fielding: Combined Practices to Balance Speed and Stability. ICSE 2013: 982-991

Ozkaya, Ipek, Michael Gagliardi, and Robert L. Nord. Architecting for Large Scale Agile Software Development: A Risk-Driven Approach, Crosstalk, May/June, 2013.

Integrate End to End Early and Often, IEEE Software July/August 2013 issue, Felix Bachmann et al

Government Accountability Office. Software Development: Effective Practices and Federal Challenges in Applying Agile Methods. Report GAO-12-681. July 2012.

Leffingwell, Dean. Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Addison-Wesley, 2011.

Leffingwell, Dean. Scaling Software Agility: Best Practices for Large Enterprises. Addison-Wesley, 2007.

Larman, Craig and Vodde, Bas, Practices for Scaling Lean & Agile Development: Large, Multisite, and Offshore Product Development with Large-Scale Scrum

For more information about quality attribute scenarios, please see:

Leffingwell, Dean. Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Addison-Wesley, 2011.

Ipek Ozkaya, Len Bass, Raghvinder Sangwan and Robert Nord. Making Practical Use of Quality Attribute Information, in IEEE Software Volume 25 Issue 2 March-April 2008, Page(s): 25-33.

To learn more about test-driven development, see:

Whittaker, James A., Jason Arbon and Jeff Carollo: How Google Tests Software (Apr 2, 2012)

Beck, Kent: Test Driven Development by Example

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed