Posted on by
Some of the principal challenges faced by developers, managers, and researchers in software engineering and cybersecurity involve measurement and evaluation. In two previous blog posts, I summarized some features of the overall SEI Technology Strategy. This post focuses on how the SEI measures and evaluates its research program to help ensure these activities address the most significant and pervasive problems for the Department of Defense (DoD). Our goal is to conduct projects that are technically challenging and whose solution will make a significant difference in the development and operation of software-reliant systems. In this post we'll describe the process used to measure and evaluate the progress of initiated projects at the SEI to help maximum their potential for success.
The Importance of Measurement and Evaluation in Software and Cybersecurity
Certain characteristics of software and cybersecurity are easy to measure--lines of code produced, errors found, errors corrected, port scan events, and time consumed. But much of what really matters in practice offers an even greater challenge to our measurement and evaluation capabilities. For example
Over the years, much of the research activity at the SEI has been focused on enhancing our ability to measure and evaluate. Indeed, the genesis of the Capability Maturity Model (CMM) and its successors in the late 1980's is based on a need--unmet at the time--to evaluate the capabilities of software development organizations to deliver on their promises. Models such as the Resilience Maturity Model (RMM) and architecture evaluation methodologies serve similar purposes.
Much of the body of research work at the SEI is focused on these challenges. And so there is a natural question:
How can we evaluation the potential significance of the research, development, testing, and evaluation (RDT&E) work we (and others) undertake to improve technology and practice in software engineering and cybersecurity?
This is a significant issue in all research, development, test, & evaluation (RDT&E) programs. We need to take risks, but we want these to be appropriate risks that will yield major rewards, and with a maximum likelihood.
An overly conservative approach to RDT&E management can yield small increments of improvement to existing practices, when what is really needed, in many cases, are game-changing advances. Indeed, in software and cybersecurity, despite the admonition of Lord Kelvin ("If you cannot measure it, you cannot improve it"), purely "numbers-based" approaches can be dangerous (viz. the famous "light under the lamppost" story). Naïve application of management-by-the-numbers is especially damaging in our disciplines, where so many important measures are lacking.
On the other hand, bold "open loop" approaches can yield risks with little likelihood of benefits. We must therefore employ a combination of measures, expert judgment, and careful identification of evaluation criteria--and we must structure the process of research to obtain early indicators and develop them when necessary (i.e., build new lampposts). This multidimensional evaluation process is essential because, in research, it is not just a matter of risk versus reward--thoughtful management in the definitional phases of a project can enable us to "push the curve out," simultaneously mitigating risk, identifying or developing meaningful measures in support of evaluation, and increasing potential for success and impact. A more general discussion on research management practice is the subject of chapter 4 of the 2002 National Academy report on Information Technology Research, Innovation, and E-Government.
In this vein, more than two decades ago, George Heilmeier, former director of the Defense Advanced Research Projects Agency (DARPA), developed a set of questions that he posed to prospective leaders of research projects. If these questions can be well addressed, then the proposed projects were more likely to be both successful and significant. These questions have been so widely adopted in research management that they have become known as the Heilmeier Catechism.
Experience at DARPA and other innovative research organizations has shown that compelling answers to the following questions are likely to predict successful outcomes:
Our work at the SEI spans a spectrum ranging from deep, technically-focused research on the hard problems of software engineering and cybersecurity--the technology, tools, practices, and data analyses--to more operationally-focused efforts supporting the application of diverse technical concepts, methods, and tools to the challenges of practice, including development and operations for complex software-reliant systems. To ensure we help the DoD and other government and industry sponsors identify and solve key technical and development challenges facing their current and future software-reliant systems, we have adapted Heilmeier's Catechism into an evaluation process that is more adapted to the particular measurement challenges in software and cybersecurity.
This evaluation process--which we summarize below--involves a combination of four factors:
The wide spectrum of activities in the SEI body of work motivates us to contemplate four different dimensions of validity in assessing proposals for research projects from SEI technical staff:
Finally, we consider the use and development of metrics, which contribute generally both to our ability to evaluate research and to the advancement of practice generally in software and cybersecurity. Much of the research we do is focused on developing and validating new metrics. But, additionally, when we develop metrics useful in research, we find they can also have broad value in practice. In the management of research projects, there are four dimensions we consider:
Advances in Metrics
As noted at the outset, measurement is, in some ways, the deepest challenge in both software engineering and cybersecurity. For example, how can we measure the security of a system or the flexibility of a software framework? Likewise, how can we develop useful subsidiary measures that can help us understand the flow of data and information within a large system, or the extent of coupling among a set of federated software components?
This challenge is not only technically deep but profoundly significant to the advancement of practice. Particularly when earned value must be evaluated, the Architecture-Led Incremental Iterative Development (ALIID) approach depends on advances in metrics. Described in the first post in this series, ALLID (or, "agile at scale") enables iterative and incremental development of highly capable and innovative systems with acceptable levels of programmatic risk. Earned value based purely on accumulation of lines of code is unlikely to lead to project success, when much of the risk mitigation (accomplished through prototyping and modeling-and-simulation, for example) leads to reduced variances in estimates, though not necessarily reduced mean values. That is, the estimates become more predictive and reliable as a result of these kinds of activities, and thus greater levels of innovation and engineering-risk taking can be supported without a necessary threat to the potential for success of the overall effort. This is the essence of iteration and incrementality.
Effective metrics can have a profound impact on the industry--witness the CMM and subsequent CMMI families of measures, just successfully transitioned out of the SEI into a separate activity at Carnegie Mellon. This is why we take as a principle that the development of new measures must be a first-class activity in the research portfolio. Unlike the work done decades ago, our disciplines are now data rich, and we can build on data collection, models and simulation, field tests, and other analytic approaches that, just a few years ago, were relatively less feasible. We see this reflected in the research portfolio at the SEI, which is moving more aggressively than ever to address these challenges and create rich possibilities for the development and operation of software-reliant systems for the DoD, other mission agencies, and their supply chains.
In this series on the SEI Technical Strategic Plan, we have outlined how we are meeting the challenges of designing, producing, assuring, and evolving software-reliant systems in an affordable and dependable manner. The first post in this series outlined our strategy for ALLID or agile-at-scale, combining work on process, measurement, and architecture. The second post in the series detailed our strategies for to achieve game-changing approaches to designed-in security and quality (evidence-based software assurance). It also identified a set of DoD-critical component capabilities relating to cyber-physical systems (CPS), autonomous systems, and big data analytics. Finally, it outlined our approach to cybersecurity tradecraft and analytics.
I am also very excited to take this opportunity to introduce you to Kevin Fall, who joined the SEI late last month as its new chief technology officer. In this role, Dr. Fall will take over responsibility from me for the evolution and execution of the SEI Technical Strategic Plan, as well as direction of the research and development portfolio of the SEI's technical programs in cybersecurity, software architecture, process improvement, measurement and estimating, and unique technical support to sponsors. I will continue to take an active role in supporting the SEI and its essential mission--even including occasional future blog posts! Kevin will join the blogger team with a blog post in the near future summarizing his vision for research and transition activities at the SEI.
We expect that the SEI Strategic Research Plan will be soon be available for download.