The SEI Technical Strategic Plan
By Kevin Fall
Deputy Director, Research, and CTO
This is the second installment in a series on the SEI's technical strategic plan.
Department of Defense (DoD) systems are becoming increasingly software reliant, at a time when concerns about cybersecurity are at an all-time high. Consequently, the DoD, and the government more broadly, is expending significantly more time, effort, and money in creating, securing, and maintaining software-reliant systems and networks. Our first post in this series provided an overview of the SEI's five-year technical strategic plan, which aims to equip the government with the best combination of thinking, technology, and methods to address its software and cybersecurity challenges. This blog post, the second in the series, looks at ongoing and new research we are undertaking to address key cybersecurity, software engineering and related acquisition issues faced by the government and DoD.
Cyberspace: a domain built of software
The DoD considers cyberspace to be a "domain"--a place where operations originate, terminate, or pass through. Cyberspace is a domain built of software, a form of building material with incredible flexibility and power, but with considerable complexity and potential risk. Providing for national security (the primary DoD mission) is strongly linked to providing security and safety in cyberspace and consequently requires a mastery of software.
The DoD depends on software in many ways--from IT systems to military platforms. Moroever, the dependence grows larger over time, as suggested in the reported exponential growth in the number of lines of software code produced every decade. There are good reasons for this growth: software is uniquely unbounded and flexible; it can be delivered and upgraded electronically and remotely; and it can be rapidly adapted to changing threats. Likewise, there is a large and well-funded commercial industry devoted to the promulgation of software. To assure its operation for government purposes requires the ability to analyze and craft software to have certain required properties (such as security, safety, and maintainability) irrespective of its origin, as well as the ability to observe its behavior when executed. In a global technology marketplace, the ability to comprehend software and its relationship to national security, weapon systems, and cyberspace has a broad and deep significance for the DoD and its supply chain.
Along with the rapid growth in the role of software in DoD systems, there has been a rapid and continuing evolution of the technology and practices associated with software, networks, and cyber operations. It is mistaken--and even dangerous--to think that software technology is somehow approaching a plateau in role, capability, and safety. In fact, recent developments in technology and practice show a pattern of continued growth, including
- big data frameworks, analytics, and machine learning
- robotics and autonomy
- anomaly and insider threat detection
- software-defined networks and radio
- safe, parallel and domain-specific programming languages
- malware and binary analysis
- model-based engineering and scalable formal methods
- quantitative cost estimation
- DevOps and agile practices
- Gamification and serious games
Carnegie Mellon University's Software Engineering Institute (SEI) brings together cybersecurity, software, and program management expertise, coupled with its university affiliation and industry access, to address the challenges facing the DoD, broader government, and industry. The SEI is the only one of 43 federally funded research and development centers that specifically focuses on software engineering and cybersecurity. Our technical strategic plan focuses on performing applied research and transitioning the work of ourselves and others to tackle high priority, challenging problems.
The SEI assures that our research aligns with stated DoD priorities, such as those described in recent DoD initiatives:
- Better Buying Power (3.0), a major initiative sponsored by DoD's acquisition leadership, which focuses on achieving capabilities through technical excellence and innovation, and emphasizes prototyping and outreach to global markets
- Reliance 21, the overarching framework of the DoD's Science and Technology (S&T) joint planning and coordination process. The goal of Reliance 21 is to ensure that the DoD S&T community provides solutions and advice to the Department's senior-level decision makers, warfighters, Congress, and other stakeholders in the most effective and efficient manner possible.
- DoD Cyber Strategy, the DoD's recent articulation of the role of cyberspace in national security. This strategy focuses on building cyber capabilities and organization's for DoD's primary cyber missions, including defending DoD networks, systems, and information; defending U.S. homeland and U.S. national interests against cyberattacks of significant consequence; and provide cyber support to military operational and contingency plans.
The remainder of this post highlights several of SEI's current research projects in various areas that are central to our strategic plan.
Effective Reduction of Avoidable Complexity in Embedded Software (ERACES)
Principal Investigators: Peter Feiler and Julien Delange
The DoD often finds that new capabilities take too long to reach the warfighter. Delays in verification/certification and recertification constitute one of the biggest roadblocks. Source code analysis and refactoring have proved to be of limited value due to lack of abstraction and time-sensitivity of embedded software.
In response to this challenge, researchers in the SEI's Software Solutions Division (SSD) are identifying design abstractions that reduce testing complexity. This research effort, Effective Reduction of Avoidable Complexity in Embedded Software (ERACES), will deliver a tool that applies data and architectural abstractions to existing models and support engineers to identify and reduce software complexity with
- fewer user-maintained variables--reducing state space explosion
- defects detected by compile-time type checking instead of tests--reducing avoidable rework cost
- fewer test cases and higher effectiveness of code coverage required by safety-critical standards (e.g., DO178-C, ISO26262)--reducing the verification effort
Human-Computer Decision Systems for Cybersecurity
Principal Investigator: Brian Lindauer
The DoD faces the challenges of securing deployed systems against malware and responding quickly enough when a security intrusion has been detected. Substantial, expensive effort from malware reverse engineers is required to understand and classify new malware samples, but the volume of unlabeled samples is growing exponentially. Researchers in the SEI's CERT Division are developing a security decision solution that combines machine learning with human expertise. This solution will have impact in the malware domain, as well as other security areas such as incident response.
A security decision system using only human experts cannot scale and is vulnerable to human errors and inattention. Pure machine learning systems cannot perform well in changing adversarial environments without continuing supervision. The need for bi-directional communication between the components is critical. Just as the algorithm learns from the human, the human will learn from the algorithm.
We hypothesize that increasingly efficient integration of defensive resources will make operational decision systems more efficient and more resistant to adversarial manipulation. This research attacks two fronts:
- Interaction of the computer and human analysts. Our work extends the theory of active learning, which studies the optimal labeling of data for machine learning. Given that experts can look at limited data, we ask not just how the selection of data impacts the algorithm, but how it impacts the human analyst.
- System evaluation and the components of operational success. We are testing the performance of human-computer systems in operation through human subjects experiments. What models best interact with real human patterns of behavior? How does the complete system respond to simple threat models such as adversarial manipulation of incoming data?
Our work in this area can spur security professionals and system designers to think about, evaluate, trust, and learn from the human components of their systems. This work will continue to advance clearer, more objective, evaluation of both the division of labor and the collaboration channels between automation and human components.
Graph Algorithms on Future Architectures
Principal Investigator: Scott McMillan
The SEI is helping the DoD take full advantage of new hardware and software technologies, such as high-performance systems with graphics processing units (GPUs). These heterogeneous high-performance computing (HHPC) systems, which often contain thousands of GPUs, support the high-level computations needed by the DoD, such as
- three-dimensional physics simulations
- network traffic data (NetFlow)
- the spread of malware
- logistics planning
At issue are graph algorithms used in analyzing extremely large data sets that do not fully, at present, exploit processing capabilities available on these current and future architectures. Graph applications need to be able to partition the graph more effectively among the various processing units in a way that limits duplicated effort and communication bottlenecks for algorithms performed on the graph.
Researchers in the SEI's Emerging Technology Center (ETC) are working with Andrew Lumsdaine, who serves on the Graph 500 Executive Committee and is regarded as a world leader in graph analytics. For example, researchers in Lumsdaine's lab extended the Boost Graph Library (BGL) for distributed-memory CPUS systems with the Parallel Boost Graph Library (PBGL). A team of ETC researchers is working towards
- development of a graph algorithm primitives specification for a standardization effort, which has led to involvement in the GraphBLAS API standardization committee (including leading researchers in academia, industry, and government)
- implementation of the GraphBLAS API for HHPC architectures with GPUs
- development, performance analysis, and release of an open-source library with a "more complete" set of graph algorithms built on top of this API for HHPC (CPU + GPU) systems
This effort will, most importantly, help HHPC systems perform high-level computations needed by the DoD on larger graphs. A software library will help developers who previously did not have access to libraries, frameworks, and patterns developed for large-memory, many-core, heterogeneous computing environments take full advantage of the capabilities offered by GPUs.
Design Pattern Recovery from Malware Binaries
Principal Investigator: Cory Cohen
Efficient determination of whether a new malware sample resembles an already-known one can help in responding to cyber attacks. Previous work on malware similarity has focused on low-level syntactic features (such as individual assembly language instructions or op-codes) or semantic concepts (such as code-level functional equivalence). In this work, we observe higher-level abstractions, specifically the design patterns used by malware authors. Malware authors are now developing reusable software components to help address common software engineering problems. Analysts in the SEI's CERT Division have noted malware families with similar designs but obviously different implementations. We propose to expand CERT's existing automated malware analysis infrastructure built in the Lawrence Livermore National Laboratory's ROSE compiler infrastructure to find such similar abstractions using ideas inspired by existing research on design pattern recovery from source code.
Our goal is to provide human analysts the information required to make design-level similarity decisions using an automated tool. Such a tool will dramatically reduce the number of hours of manual reverse engineering required to gather data for such comparisons. Using ROSE, we develop a type recovery system to automatically recover the prototype declarations of malware functions, and a design pattern matching system to look for patterns in malware. We will use these capabilities to support ongoing work in the similarity and evolution of malware families.
Wrapping Up and Looking Ahead
The SEI brings the best combination of thinking, technology, and methods to the most deserving government software-related problem sets, free from conflict-of-interest. As part of CMU, the SEI has access to facilities and research talent including professors, students, and staff members. Our FFRDC status and DoD affiliation grants our technologists access to government data and knowledge of national challenges unusual for most university R&D labs.
Future posts in this series will highlight other current and forthcoming activities in each of our technical areas that are helping to support the DoD and other federal agencies. We welcome your feedback on the technical strategic plan and vision for the SEI.
Please leave feedback in the comments section below.
Download the latest technical notes, papers, publications, and presentations from SEI researchers at our digital library http://resources.sei.cmu.edu/library/.