Testing Concurrent Systems: Concurrency Defects, Testing Techniques, and Recommendations

Concurrency, which exists whenever multiple entities execute simultaneously, is a ubiquitous and an unavoidable fact of life in systems and software engineering. It greatly increases system and software complexity, which directly impacts testing. Concurrency leads to nondeterministic behavior and numerous types of concurrency defects that require specialized approaches to uncover.

At the SEI, we are often called upon to review development planning documents including Test and Evaluation Master Plans (TEMPs) and Software Test Plans (STPs). We are also frequently tasked to evaluate developmental testing including software and system integration laboratories (SILs) and other test environments. One common observation is that test planning and implementation does not explicitly address the development of test cases designed to uncover concurrency defects.

In this blog post, I explore the different types of concurrency, the resulting types of concurrency defects, and the most important testing techniques for uncovering these defects. This blog post ends with a set of recommendations for improving the effectiveness and efficiency of concurrency testing.

Concurrency exists in three major forms, each of which has its own impact on testing:

External concurrency (a.k.a., environmental concurrency) exists when software applications and systems interact with actors in their external environment. The actors have their own simultaneous behavior, both with the software/system under test (SUT) and with each other. This SUT-external concurrency leads to a large number of ways that inputs from the actors and outputs to the actors can be interleaved with the behavior of the SUT.
Actual concurrency (a.k.a., distributed concurrency) exists when multiple components of the SUT truly execute simultaneously. For example, multiple systems within a system of systems (SoS) operate concurrently and relatively independently. Similarly, multiple subsystems within the same system typically operate concurrently. Individual subsystems might contain multiple computers running concurrently, and individual computers may have multiple processors or use multicore processors, adding even more levels of actual concurrency.
Virtual concurrency (a.k.a., local concurrency) exists when multiple software components of the SUT share the processing power and resources of a single processor or core. Virtual concurrency is common due to the use of programming language-level concurrency (i.e., tasking), operating-system threads, virtual machines, and containers.

This distinction between actual and virtual concurrency has testing ramifications, especially with regard to the actual concurrency in distributed systems and SoSs. Actual concurrency is far more complex than virtual concurrency, decreasing both controllability and visibility, which are two major components of testability. Concurrency defects that cause order violations and fail to properly handle faults, crashes, and reboots are also more common in distributed systems than in single-node systems.

Concurrency has four significant consequences that greatly increase the complexity and difficulty of both system and software engineering and testing:

Communication. Because concurrently executing components are part of the same software application, system, or system of systems, there is communication between these components. This communication may either be direct and intentional (e.g., via message passing) or indirect and unintentional (e.g., via interference paths connecting the components to shared resources). Unfortunately, the existence of such interference paths can result in the violation of the intended spatial and temporal isolation of these components, even when technologies such as virtualization and containerization are used to enforce that isolation.
Synchronization. Communication between concurrently executing components typically requires the synchronization of their behavior. However, the number of test cases increases rapidly with the number of concurrent components because the timing and order of the resulting interleaving of their execution can occur in many ways.
Nondeterminism. Concurrency can almost always cause nondeterministic behavior in which the execution with the same inputs may lead to different outcomes, so that behavior becomes probabilistic, and the same test may produce different results. This lack of determinism can produce Heisenbugs that are hard to uncover and reproduce. It also makes it hard to determine the underlying concurrency defect.
Concurrency Defects. Systems and software typically have concurrency-related architecture, design, and implementation defects that result in concurrency-related faults and failures.

Many developers mistakenly assume that the system/software architecture adequately addresses concurrency or that the operating system, hypervisor, or container engine automatically take care of concurrency. Many developers and testers often have only a minimal understanding and appreciation for the following types of concurrency defects, as well as the associated failure conditions and events:

Deadlock is the failure condition that exists when a concurrent component cannot proceed because it needs to obtain a resource (e.g., computer, processor, and shared memory) that is held by a second one, while the first one holds a resource that the second one needs. Thus, each deadlocked component is forced to wait for the other one to release the resource it needs. Occasionally, more than two concurrent components can form a deadlocked ring.
Livelock is the failure condition that exists when a concurrent component repeatedly obtains a needed resource but never completes its execution before losing that resource (e.g., to a higher-priority component).
Starvation is the failure condition that exists when a concurrent component is ready to execute, but is indefinitely delayed because other components have the necessary resources.
Suspension is the failure condition that exists when a concurrent component is forced to wait too long before it can access a shared resource (i.e., it eventually obtains the resource but too late for proper execution).
Race Condition (a.k.a., data race) is the failure condition that exists when the nondeterministic ordering of multiple operations can result in different, unexpected and potentially incorrect behaviors. A race condition can occur if the correct ordering of the operations is not enforced or if shared resources are not protected from simultaneous access. For example, one thread or process writes to an unprotected memory location, while another simultaneously accesses it, thereby corrupting the stored data.
Priority Inversion is the failure condition that exists when a higher priority operation of one concurrent component is forced to wait on the execution of a lower priority operation of another concurrent operation.
Atomicity Violation is the failure event that occurs when the execution of one concurrent component interrupts another concurrent component's execution that must run to completion without disruption. The interruption could be caused by a higher-priority task or an unexpected message, exception, timeout, hardware interrupt, crash, or reboot.

The following three additional types of concurrency-related defects may overlap with the previous ones or each other:

Isolation Failure is any failure event during which two concurrently executing components unexpectedly access a shared resource simultaneously via an interference path, thereby violating the components' spatial or temporal isolation. These shared resources can be processor-internal (e.g., L3 cache, system bus, memory controller, I/O controllers, and interconnects) or processor-external (e.g., main memory, I/O devices, networks, and subsystems). Note that a temporal isolation failure can lead to the failure of hard-real-time requirements (e.g., unacceptable response time and unacceptable jitter).
Synchronization Failure is any failure event involving the synchronization of two or more concurrently executing components. Examples include (1) unexpected message, (2) expected message that fails to arrive, arrives too early, arrives too late, or (3) multiple messages that arrive in an incorrect order.
Lack of Tolerance is any concurrency-related fault or failure condition that is not mitigated by an appropriate fault or failure tolerance mechanism such as exception handling or redundancy with voting.

In practice, concurrency defects typically share the following properties:

Concurrency defects tend to cause intermittent faults or failures that only occur rarely, often after prolonged execution.
Concurrency defects cause faults and failures that are hard to both reproduce and analyze to identify their root causes. The same test inputs may produce different test results due to the nondeterministic nature of concurrent systems and software. The fault or failure may not reoccur when the software under test executes in debug mode.
Concurrency defects can cause violations of both functional requirements and quality (e.g., performance, reliability, and safety) requirements. For example, accidental violations of temporal isolation in real-time systems may cause a violation of response time, jitter, and latency requirements.
Concurrency defects are often not explicitly addressed during test planning (e.g., when selecting test methods).
The different types of concurrency defects are only rarely used to drive the selection of test cases specifically designed to uncover them.
Correcting concurrency defects often require specialized rearchitecting or redesign rather than simple, software code changes.

Academic researchers have proposed a vast number of different techniques and algorithms for the testing of software containing concurrently-executing components, including:

Simone Souza, Maria Brito, Rodolfo Silva, Paulo Souza, and Ed Zaluska, "Research in Concurrent Software Testing: A Systematic Review," PADTAD '11, July 17, 2011.
Vinay Arora, Rajesh Bhatia, and Maninder Singh, "A systematic review of approaches for testing concurrent programs," Concurrency and Computation: Practice and Experience, Vol. 28: pp. 1572-1611, 2016.
Francesco Bianchi, Allessandro Margara, and Mauro PezzÃ¨, "A Survey of Recent Trends in Testing Concurrent Software Systems," IEEE Transactions on Software Engineering, Vol. 44, No. 8, pp. 747-783. August 2018.
Slvana Melo, Jeffrey Carver, Paulo Souza, and Simone Souza, "Empirical research on concurrent software testing: A systematic mapping study," Information and Software Technology, 105, pp. 226-25, 2019.

In theory, many of these testing approaches designed to uncover software concurrency defects can also be used in the testing of systems and systems of systems. However, very few of these approaches are observed in use on real programs, and most developers and testers appear unaware of their existence. This lack of awareness and use has many reasons:

Many concurrency testing techniques do not scale well to match the size and complexity of current systems. Because complexity increases exponentially as the number of concurrently-executing components increases, it is not practical to exhaustively test all possible interleavings and synchronization orders. This complexity could lead to enormous test suites, and some of the proposed approaches concentrate on minimizing test suite size without unduly decreasing test effectiveness. Still, most of the proposed approaches do not necessarily scale well to address today's large and complex software applications, systems, and systems of systems.
Most developers and testers are relatively unfamiliar with explicitly dealing with concurrency. They lack adequate training in concurrency, concurrency-related defects, and concurrency-related testing.
Developers and testers tend to be comfortable using the testing techniques they are familiar with, even though these techniques were not designed for testing concurrent systems and software. For example, Junit is a popular Java testing unit testing framework and tool that does not support concurrency testing out-of-the-box.
Testing in practice is often performed to demonstrate that the software/system works properly under normal circumstances rather than to explicitly uncover defects (i.e., demonstrate that it does not work under all circumstances).
Most of the concurrency testing techniques have been developed for testing software. There has been relatively little research into addressing concurrency at the subsystem, system, and system of systems levels.
According to literature surveys, the majority of concurrency testing techniques are academic proposals rather than mature methods that have been demonstrated to be effective under real-world circumstances.
Many concurrency testing techniques require the use of tools that are currently immature and not widely known.

In practice, most developers and testers seem to use the same methods they have traditionally used to test sequential software, assuming that either any concurrency defects will be uncovered along the way or else that these defects will so rarely cause faults or failures that they can be safely ignored. Research and experience have shown that this "carry on and hope for the best" approach is ineffective at uncovering concurrency defects.

In terms of practicality and actual use, the most important concurrency testing approaches appear to be:

Structural Whitebox Unit and Integration Testing Methods. These approaches concentrate on using architectural and design information to identify points of communication and synchronization, followed by creating sufficient test cases to provide adequate coverage.
Defect-Driven Testing. Also known as heuristic and search-based testing, these approaches are based on analyzing the architecture, design, and implementation for potential locations where the different types of concurrency defects might reside, followed by the manual generation of test cases specifically designed to uncover them.
Long-Duration/Random Testing. These approaches (e.g., fuzz testing, reliability testing, soak testing, and stress testing) are based on the fact that nondeterminism implies that the execution of concurrency defects only occasionally results in concurrency faults and failures. Thus, large test suites are required to provide reasonable assurance that they uncover most concurrency defects. The test cases can either be generated randomly or stochastically based on the application or system's operational profile.
Model-Based Testing (MBT). Model-based testing tools automatically generate large test suites from test models that abstractly define expected behavior. Many MBT tools support modeling concurrent behavior so that thousands of tests that exercise concurrent behavior may be generated in the time it takes to hand-code a few tests. This has proven to be highly effective for large-scale and critical systems.

Other types of concurrency testing (reachability testing, mutation testing, slicing-based testing, and formal-methods-based testing) have also been proposed.

To do a better job of uncovering concurrency defects, developers and testers should consider the following recommendations:

Study. Develop expertise in both actual and virtual concurrency, the associated types of concurrency defects, and the different types of concurrency testing methods.
Plan. Explicitly address testing to uncover concurrency defects in the project test plan. Include appropriate types of concurrency testing and any associated test coverage metrics.
Ensure Temporal Requirements Exist. The need for an associated test oracle requires the unlikely existence of highly-complete timing requirements that specify--or enable the derivation of--the proper behavior of all of these unlikely interleavings (for example, "message A must arrive before message B arrives or is sent").
Review. Before testing, review the concurrency architecture and design for possible types and locations of concurrency defects. Raise the alarm if the architecture and design do not adequately address concurrency.
Adequately Cover Interleavings. Because concurrency defects are rarely uncovered by nominal "sunny-day" test cases, also develop test cases that cover the different possible interleavings of messages, exceptions (e.g., via fault injection), timeouts, network failures, node failures, and reboots. Because of the exponential explosion of potential test orderings as system size and complexity increase, using combinatorial (e.g., pair-wise) testing is typically needed to keep test suite size practical.
Focus. Pay special attention to concurrency testing when testing safety-critical, mission-critical, and real-time systems, subsystems, and software.
Test for Rare Bugs. Because concurrency faults and failures often occur both intermittently and relatively rarely, perform sufficient reliability and soak testing to uncover rare concurrency bugs.
Evaluate Logs. Use instrumenting and evaluate the message and test logs for signs of concurrency problems.
Augment Testing. In addition to testing, use architecture and design peer reviews, concurrency-related coding practices, and static and dynamic analysis tools.

Although the ubiquitous nature of concurrency in today's applications makes concurrency defects inevitable, few developers and testers explicitly develop adequate test suites to uncover them. While researchers have proposed numerous approaches to testing concurrent applications and systems, the majority of these approaches are not being used on real projects. However, a solid understanding and appreciation of concurrency as well as the different types of concurrency defects and concurrency testing can help developers and testers augment their traditional testing approaches with test cases that will more effectively uncover concurrency-related defects.

Additional Resources

Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, and Haryadi S. Gunawi, "TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems," Proceedings of the 21th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2016.
Simone Souza, Maria Brito, Rodolfo Silva, Paulo Souza, and Ed Zaluska, "Research in Concurrent Software Testing: A Systematic Review," PADTAD '11, July 17, 2011.
Vinay Arora, Rajesh Bhatia, and Maninder Singh, "A systematic review of approaches for testing concurrent programs," Concurrency and Computation: Practice and Experience, Vol. 28: pp. 1572-1611, 2016.
Francesco Bianchi, Allessandro Margara, and Mauro PezzÃ¨, "A Survey of Recent Trends in Testing Concurrent Software Systems," IEEE Transactions on Software Engineering, Vol. 44, No. 8, pp. 747-783. August 2018.
Slvana Melo, Jeffrey Carver, Paulo Souza, and Simone Souza, "Empirical research on concurrent software testing: A systematic mapping study," Information and Software Technology, 105, pp. 226-25, 2019.

Software Engineering Institute

SEI Blog

Testing Concurrent Systems: Concurrency Defects, Testing Techniques, and Recommendations

Donald Firesmith

June 24, 2019

PUBLISHED IN

CITE

TAGS

SHARE

Concurrency

Types of Concurrency Defects

Testing Techniques

Recommendations

Conclusion

Written By

Donald Firesmith

Digital Library Publications

Send a Message

More By The Author

System Resilience Part 7: 16 Guiding Principles for System Resilience

April 27, 2020 • By Donald Firesmith

System Resilience Part 6: Verification and Validation

April 20, 2020 • By Donald Firesmith

System Resilience Part 5: Commonly-Used System Resilience Techniques

February 17, 2020 • By Donald Firesmith

System Resilience Part 4: Classifying System Resilience Techniques

February 3, 2020 • By Donald Firesmith

System Resilience Part 3: Engineering System Resilience Requirements

January 13, 2020 • By Donald Firesmith

More In Cybersecurity Engineering

Cyber-Informed Machine Learning

February 10, 2025 • By Jeffrey Mellon, Clarence Worrell

13 Cybersecurity Predictions for 2025

January 21, 2025 • By Greg Touhill

An Introduction to Hardening Docker Images

November 11, 2024 • By Maxwell Trdina, Sasank Vishnubhatla

A Framework for Detection in an Era of Rising Deepfakes

October 28, 2024 • By Matthew Walsh

Evaluating Static Analysis Alerts with LLMs

October 7, 2024 • By William Klieber, Lori Flynn