search menu icon-carat-right cmu-wordmark

Six Best Practices for Developer Testing

Bob Binder
CITE

Code coverage represents the percent of certain elements of a software item that have been exercised during its testing. As I explained in my first post in this series on developer testing, there are many ideas about which code elements are important to test and therefore many kinds of code coverage. In this post, the second post in the series, I explain how you can use coverage analysis to routinely achieve consistently effective testing.

How Should Code Coverage Be Used?

To reiterate, even meeting 100 percent coverage for a stringent criterion, such as multiple condition decision coverage (MCDC), is never a guarantee of bug-free software. It simply means your tests have looked for the usual suspects. Despite this technical limitation, however, code coverage is effective for revealing missing or superficial testing that results from cognitive limitations.

Test design can be based on systematic and explicit analysis. More often, however, it is simply improvised using developer or tester perception of the SIUT. This perception is subject to confirmation bias and unawareness of unobvious interactions, side effects, and external dependencies. Confirmation bias (the human tendency to see what you want) is common in developer testing and directly related to high defect levels. As Glenford J. Myers noted more than 40 years ago in The Art of Software Testing, "If our goal is to demonstrate that a program has no errors, then we shall be subconsciously steered toward this goal."

A controlled study of developer testing found that, for any given feature, tests to demonstrate the features were four times more likely to be coded than tests that attempted to provoke failures. Developers were twice as likely to choose tests that exercised "happy paths" than those that covered error and exception handling. For example, the open source test suites for Unix utilities tex and awk, developed and refined by renowned computer scientist Donald Knuth and his students for more than a decade were independently analyzed. As Joseph Horgan and Aditya Mathur wrote in Handbook of Software Reliability Engineering, these test suites achieved 70 percent and 85 percent statement coverage--leaving significant parts of the code completely untested despite decades of refinement and use in Unix-based systems.

These results show why, even though it should never be used as the primary criterion of test completeness, coverage should always be checked to avoid blind spots, misunderstandings, and omissions in test design.

A Definition of Done for Developer Testing

What follows is a definition of done for developer testing. It is intended to be comprehensive and will entail more work than playing code coverage roulette. Why should you do this? As Boris Beizer wrote in Software Testing Techniques, "One can argue that 100 percent statement coverage is a lot of work, to which I reply that debugging is a lot more work."

For each SIUT, there is an automated test suite under configuration management control used in continuous integration that:

  • Evaluates at least one instance of compliance with every antecedent requirement, specification element, or user story.
  • Sets every input variable to an invalid value at least once and evaluates the response. For character types, generic invalid classes are: null, undefined, missing, wrong type, excessive length, etc. For numeric types: minimal value, minimal value decremented by the smallest feasible unit, maximal value, and maximal value incremented by the smallest feasible unit. Use a fuzzer if you can (For example, the CERT Division's Basic Fuzzing Framework generates fuzzed files).
  • Submits input that is expected to cause at least one instance of every output effect. (An effect is a distinct class of outputs. For example, there are four distinct effects in the classic Triangle classification problem: equilateral, scalene, isosceles, and error.)
  • Uses every pair-wise combination of selected input variable values at least once. Selected values include invalid items as suggested in the second bullet point above and values to produce each output effect.
  • When the item under test implements sequential behavior (i.e., it can be characterized as a state machine) N+ coverage, including all sneak-paths, is achieved.
  • When the item under test shares resources with other threads or processes that the IUT does not control, each mode of concurrent execution (including the failure of the IUT and its dependencies) is achieved at least once.
  • Achieves 100 percent decision coverage: each possible code segment that a conditional expression may activate is entered at least once.
  • Executes at least once every case block in case/switch expressions, including the default or "otherwise" case.
  • Triggers every exception at least once.
  • Enters every loop and takes at least a second iteration, with each loop termination condition achieved at least once (For nested loops, see pages 380 - 384 of my book Testing Object-oriented Systems: Models, Patterns, and Tools.)
  • For overloaded and/or dynamic interfaces in object-oriented programming languages, every known binding should be achieved at least once. For example, an abstract base class has three derived classes--you would repeat the same suite of unit tests on an instantiation of each derived class as well as testing methods specific to the derived classes.

These are of course generic test requirements--you should adapt and expand them according to your application and situation. A test suite that achieves all of these conditions produces credible evidence that your testing is not subject to cognitive limitations, has exercised the full range of behavior, and has not omitted any part of the code.

Testing without Coverage Roulette: Six Best Practices

Composite white- and black-box test criteria that use coverage for insight can produce strong objective evidence of adequate software item testing (See Figures 9.1, p. 319 and 10.1, p. 352 of my book). It isn't technically hard or excessively time consuming, especially when you build it step-by-step with test-driven development (TDD). Holistic thinking and coverage analysis will help you avoid the TDD tunnel vision that leads to superficial testing.

  • Use an automated unit testing framework like Junit, GoogleTest, or Vector CAST; there are hundreds to choose from (See this list of unit testing frameworks).
  • Write your tests and code as you go. The simple tests sufficient for TDD purposes are a good start, but don't stop at vanity demos and happy path testing. Add tests that systematically and fully exercise your code as a whole, as well as evaluating its robustness. If you need test design ideas, my book presents 35 test design patterns. Continually ask yourself what unusual behaviors or weird execution paths you might be missing.
  • When you've exhausted your test design ideas and fixed all your bugs, then use a coverage analyzer, rerun your tests, and get the coverage report. Check for decision coverage: it achieves statement coverage as a bonus, exercises paths that can be easily missed with statement coverage, and typically requires about the same amount of work.
  • Consider uncovered elements. They often provide clues to subtle problems, omissions, or unanticipated code behavior.
  • For elements that are truly blocked, rigorously inspect them. Document why they cannot be reached and why you and your inspectors have concluded the blocked code is excused from testing. Code comments are a great place for this kind of explanation, as it is often useful information for future maintainers.
  • Use mocks or more advanced simulation to relieve blockages. Revise your test suite to reach these elements, re-run, and check coverage again. Again, think carefully about what the blockages are telling you.

Looking Ahead

Following these steps means you'll never have to play code coverage roulette again. Developer testing based on systematic test design validated with the highest feasible coverage produces credible evidence that an adequate search for code-specific bugs has been conducted. Subsequent testing at integration and system scope can then focus on end-to-end behavior without running the risk of missing bugs lurking in components.

Additional Resources

Read more posts by Bob Binder.

CITE

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed