Common Testing Problems: Pitfalls to Prevent and Mitigate

In the first blog entry of this two part series on common testing problems, I addressed the fact that testing is less effective, less efficient, and more expensive than it should be. This second posting of a two-part series highlights results of an analysis that documents problems that commonly occur during testing. Specifically, this series of posts identifies and describes 77 testing problems organized into 14 categories; lists potential symptoms by which each can be recognized; potential negative consequences, and potential causes; and makes recommendations for preventing them or mitigating their effects.

Why Testing is a Problem

A widely cited study for the National Institute of Standards & Technology (NIST) reports that inadequate testing methods and tools annually cost the U.S. economy between $22.2 billion and $59.5 billion, with roughly half of these costs borne by software developers in the form of extra testing and half by software users in the form of failure avoidance and mitigation efforts. The same study notes that between 25 percent and 90 percent of software development budgets are often spent on testing.

Despite the huge investment in testing mentioned above, recent data from Capers Jones shows that the different types of testing are relatively ineffective. In particular, testing typically only identifies from one-fourth to one-half of defects, while other verification methods, such as inspections, are typically more effective. Inadequate testing is one of the main reasons that software is typically delivered with approximately 2 to 7 defects per thousand lines of code (KLOC). While this may seem like a negligible number, the result is that major software-reliant systems are being delivered and placed into operation with hundreds or even thousands of residual defects. If software vulnerabilities (such as the CWE/SAN Top 25 Most Dangerous Software Errors) are counted as security defects, the rates are even more troubling.

Overview of Different Types of Testing Problems

The first blog entry in this series covered the following general types of problems that are not restricted to a single kind of testing:

test planning and scheduling problems
stakeholder involvement and commitment problems
management-related testing problems
test organization and professionalism problems
test process problems
test tools and environments problems
test communication problems
requirements-related testing problems

The remainder of this second post focuses on the following six categories of problems, each restricted to one of the following types of testing:

unit testing
integration testing
specialty engineering testing
system testing
system of system testing
regression testing

Unit testing problems primarily occur during the testing of individual software modules, typically by the same person who developed it in the first place. Design volatility could be causing excessive iteration of the unit test cases, drivers, and stubs. Unit testing could suffer from a conflict of interest as developers naturally want to demonstrate that their software works correctly while testers should seek to demonstrate that software fails. Finally, unit testing could be poorly and incompletely performed because the developers think it is relatively unimportant.

Integration testing problems occur during the testing of a set of units integrated into a component, a set of components into a subsystem, a set of subsystems into a system, or a set of systems into a system of systems. Integration testing concentrates on verifying the interactions between the parts of the whole. One potential problem is the difficulty of localizing defects to the correct part once the parts have been integrated. A second potential problem is inadequate, built-in test software that could help locate the cause of any failed test. Finally, a third problem is the potential lack of availability of the correct (versions of the) parts to integrate.

Specialty engineering testing problems occur when an inadequate amount of specialized testing of various quality characteristics and attribute testing takes place. More specifically, these problems involve inadequate capacity, concurrency, performance, reliability, robustness (e.g., error and fault tolerance), safety, security, and usability testing. While these are the most commonly occurring types of specialty engineering testing problems, other types of specialty testing problems may also exist depending on which quality characteristics and attributes are important (and thus the type of quality requirements that have been specified).

System testing problems occur during system-level testing and often cannot be eliminated because of the very nature of system testing. At best, recommended solutions can only mitigate these problems. Is it hard to test an integrated system's robustness (support for error, fault, and failure tolerance) due to the challenges of triggering system-internal exceptions and tracing their handling. System-level testing can be hard because temporary test hooks have typically been removed so that one is testing the actual system to be delivered. As with integration testing problems, demonstrating that system tests provide adequate test coverage is hard because reaching a specific code (e.g., fault tolerance paths) by only using inputs to the black-box system is hard. Finally, there is often inadequate mission-thread-based testing of end-to-end capabilities because system-testing is often performed using use-case-based testing, which is typically restricted to interactions with only a single, primary, system-external actor.

System-of-Systems (SoS) testing problems are often the result of SoS governance problems (i.e., everything typically occurs at the system-level rather than SoS-level). For example, SoS planning may not adequately cover SoS testing. Often, no organization is made explicitly responsible for SoS testing. Funding is often focused at the system-level, leaving little/no funding for SoS testing. Scheduling is typically performed only at the individual system level, and system-level schedule-slippages make it hard to schedule SoS testing.

SoS requirements are also often lacking or of especially poor quality, making it hard to test the SoS against its requirements. The individual system-level projects rarely allocate sufficient resources to support SoS testing. Defects are typically tracked only at the system level, making it difficult to address SoS-level defects. Finally, there tends to be a lot of finger-pointing and shifting of blame when SoS testing problems arise and SoS testing uncovers SoS-level defects.

Note that a SoS almost always consists of independently governed systems that are developed, funded, and scheduled separately. SoS testing problems therefore do not refer to systems that are developed by a prime contractor or integrated by a system integrator, nor do they refer to subsystems developed by subcontractors or vendors.

Regression testing problems occur during the performance of regression testing, both during development and maintenance. Often, there is insufficient automation of regression testing, which makes regression testing too labor-intensive to perform repeatedly, especially when using an iterative- and incremental-development-cycle. This overhead is one of the reasons that regression testing may not be performed as often as it should be.

When regression testing is performed, its scope is too localized because software developers think that changes in one part of the system will not propagate to other parts, and thereby cause faults and failures. Low-level regression testing is commonly easier to perform than higher-level regression testing, which results in an over-reliance on low-level regression tests. Finally, the test resources created during development may not be delivered and thus may not be available to support regression testing during maintenance.

Addressing Test-type Specific Problems

For each testing problem described above, I have documented several types of information useful for understanding the problem and implementing a solution. This information will be appearing in an upcoming SEI technical report. As an example of what will appear in this report, the testing problem "Over-reliance on COTS Testing Tools" has been documented with the information described below

Description. Too few of the regression tests are automated.

Potential symptoms. Many or even most of the tests are being performed manually.

Potential consequences.

Manual regression testing takes so much time and effort that it is not done.
If performed, regression testing is rushed, incomplete, and inadequate to uncover sufficient number of defects.
Testers are making an excessive number of mistakes while manually performing the tests.
Defects introduced into previously tested subsystems/software while making changes may remain in the operational system.

Potential causes. Testing stakeholders (e.g., managers and the developers of unit tests) may mistakenly believe that performing regression testing is neither necessary nor cost effective because

of the minor scope of most changes
system testing will catch any inadvertently introduced integration defects
they are overconfident that changes have not introduced any new defects

Testing stakeholders may also not be aware of the

importance of regression testing
value of automating regression testing

Other potential causes may include

Automated regression testing may not be an explicit part of the testing process.
Automated regression testing may not be incorporated into the Test and Evaluation Master Plan (TEMP) or System/Software Test Plan (STP).
The schedule may contain little or no time for the development and maintenance of automated tests.
Tool support for automated regression testing may be lacking (e.g., due to insufficient test budget) or impractical to use.
The initially developed automated tests may not be maintained.
The initially developed automated tests may not be delivered with the system/software.

Recommendations. Prepare by explicitly addressing automated regression testing in the project's

TEMP or STP
test process documentation (e.g., procedures and guidelines)
master schedule
work break down structure (WBS)

Enable solution of the problem by

providing training/mentoring to the testing stakeholders in the importance and value of automated regression testing
providing sufficient time in the schedule for automating and maintaining the tests
providing sufficient funding to pay for automated test tools
ensuring that adequate resources (staffing, budget, and schedule) are planned and available for automating and maintaining the tests

Perform the following tasks:

Automate as many of the regression tests as is practical.
Where appropriate, use commercially available test tools to automate testing.
Ensure that both automated and manual test results are integrated into the same overall test results database so that test reporting and monitoring are seamless.
Maintain the automated tests as the system/software changes.
Deliver the automated tests with the system/software.
When relevant, identify this problem as a risk in the project risk repository.

Verify that

The test process documentation addresses automated regression testing.
The TEMP / STP and WBS address automated regression testing.
The schedule provides sufficient time to automate and maintain tests.
A sufficient number of the tests have been automated.
The automated tests function properly.
The automated tests are properly maintained.
The automated tests are delivered with the system/software.

Related problems. no separate test plan, incomplete test planning, inadequate test schedule, unrealistic testing expectations / false sense of security, inadequate test resources, inadequate test maintenance, over-reliance on manual testing, tests not delivered, inadequate test configuration management (CM)

Benefits of Using the Catalog of Common Testing Problems This analysis of commonly occurring testing problems--and recommended solutions--can be used as training materials to better learn how to avoid, identify, and understand testing problems and mitigate them. Like anti-patterns, these problem categories can be used to improve communication between testers and testing stakeholders. This list can also be used to categorize problem types for metrics collection. Finally, they can be used as a checklist when

producing test plans and related documentations
evaluating contractor proposals
evaluating test plans and related documentation (quality control)
evaluating as-performed test process (quality assurance)
identifying test-related risks and their mitigation approaches

Future Work

The framework of testing problems outlined in this series is the result of more than three decades of experience in assessments and my involvement in numerous projects and discussions with testing subject matter experts. Even after all this time, however, several unanswered questions remain that I intend to be the subject of future study:

Probabilities. Which of these problems occur most often? What is the probability distribution of these problems? Which problems tend to cluster together? Do different problems tend to occur with different probabilities in different application domains such as commercial versus governmental versus military and web versus information technology versus embedded systems, etc.)?
Severities. Which problems have the largest negative consequences? What are the probability distributions of harm caused by each problem?
Risk. Based on the above probabilities and severities, which of these problems cause the greatest risks? Given these risks, how should one prioritize the identification and resolution of these problems?

I am interested in turning my work on this topic thus far into an industry survey and perform a formal study to answer these questions. I welcome your feedback on my work to date in the comments section below.

Additional Resources

To view a presentation on this work, please view, Common Testing Problems: Pitfalls to Prevent and Mitigate, and the associated Checklist Including Symptoms and Recommendations, which were presented at the FAA Verification and Validation Summit 8 (2012) in Atlantic City, New Jersey on 10 October 2012.

Software Engineering Institute

SEI Blog