search menu icon-carat-right cmu-wordmark

Data Science, Blacklists, and Mixed-Critical Software: The Latest Research from the SEI

As part of an ongoing effort to keep you informed about our latest work, this blog posting summarizes some recently published SEI technical reports, white papers, and webinars in early lifecycle cost estimation, data science, host protection strategies, blacklists, the Architectural Analysis and Design Language (AADL), architecture fault modeling and analysis, and programming and verifying distributed mixed-synchrony and mixed-critical software. These publications highlight the latest work of SEI technologists in these areas. This post includes a listing of each publication, author(s), and links where they can be accessed on the SEI website.

Segment-Fixed Priority Scheduling for Self-Suspending Real-Time Tasks
By Junsung Kim, Björn Andersson (Carnegie Mellon University), Dionisio de Niz, Ragunathan (Raj) Rajkumar, Jian-Jia Chen, Wen-Hung Huang, Geoffrey Nelissen

Recent trends in system-on-a-chip show that an increasing number of special-purpose processors are being added to improve the efficiency of common operations. Unfortunately, the use of these processors may introduce suspension delays incurred by communication, synchronization, and external I/O operations. When these processors are used in real-time systems, conventional schedulability analyses incorporate these delays in the worst-case execution/response time, thereby significantly reducing the schedulable utilization.

This report describes schedulability analyses and proposes segment-fixed priority scheduling for self-suspending tasks. We model the tasks as segments of execution separated by suspensions. We start from providing response-time analyses for self-suspending tasks under rate monotonic scheduling (RMS). While RMS is not optimal, it can be used effectively in some special cases that we have identified. We then derive a utilization bound for the cases as a function of the ratio of the suspension duration to the period of the tasks. For general cases, we develop a segment-fixed priority scheduling scheme. Our scheme assigns individual segments different priorities and phase offsets that are used for phase enforcement to control the unexpected self-suspending nature.
Download the report.

Creating Centralized Reporting for Microsoft Host Protection Technologies: The Enhanced Mitigation Experience Toolkit (EMET)
By Craig Lewis, Joseph Tammariello

Host protection strategies, such as enabling anti-exploitation features, can be effective in protecting Windows endpoints from compromise. Microsoft offers a tool to assist in this area and is provided at no cost. The Enhanced Mitigation Experience Toolkit (EMET) is a utility that helps to prevent the exploitation of software vulnerabilities.

EMET can be effective in safeguarding organizations from compromise by malicious actors. The configuration of EMET can be controlled centrally by enterprise system administrators using group policy. While centralized management capability is built into the tool, centralized reporting capabilities are not, creating a challenge when it comes to real-time situational awareness, metrics gathering, troubleshooting, and reporting. This report presents methods by which systems administrators and/or information security personnel can create a centralized reporting console using native Windows capabilities and the Splunk machine data analysis engine.
Download the report.

The QUELCE Method: Using Change Drivers to Estimate Program Costs
By Sarah Sheard

Problems with cost estimation, ranging from estimator overconfidence to unintegrated tools, result in potentially billions of dollars of unanticipated expenses for Department of Defense programs. Quantifying Uncertainty in Early Lifecycle Cost Estimation (QUELCE), developed by the Carnegie Mellon University Software Engineering Institute, is a method for estimating potential program costs in a way that acknowledges and uses uncertainty that occurs early in the development lifecycle. This report first summarizes the QUELCE method. QUELCE computes a distribution of program costs based on Monte Carlo analysis of program cost drivers--assessed via analyses of dependency structure matrices and Bayesian belief networks--and a standard project cost estimation tool. The analyses are based on change drivers, or changes that might occur that would substantially change the cost outcome of a program. The report then provides the current organization scheme of change drivers and describes how each one is used to determine any additional impacts that should be folded into the cost estimate. Finally, it introduces elaborations to the change drivers for application to sustainment-phase programs.
Download the report

Data Science: What It Is and How It Can Help Your Company
By Brian Lindauer, Eliezer Kanal

Over the past few years, there has been a veritable explosion of hiring in the field of data science. Just 10 years ago, the phrase data scientist was almost unheard of; now, data scientist positions are advertised across numerous industries, with a particular focus on high tech. What is this position and why is it relevant? In this webinar, we discuss this position from a number of angles--what the term "data science" means, what skills a data scientist brings to the table, what competitive edge data science can bring to your team, and the differences between data science and business analysis. We also discuss a number of case studies that describe how data science can be integrated into existing businesses as well as how to best make use of data scientists' skills.
View the webinar

A Requirement Specification Language for AADL
By Peter H. Feiler, Julien Delange, Lutz Wrage

This report describes a textual requirement specification language, called ReqSpec, for the Architecture Analysis & Design Language (AADL). It is based on the draft Requirements Definition and Analysis Language Annex, which defines a meta-model for requirement specification as annotations to AADL models. A set of plug-ins to the Open Source AADL Tool Environment (OSATE) toolset supports the ReqSpec language. Users can follow an architecture-led requirement specification process that uses AADL models to represent the system in its operational context as well as the architecture of the system of interest. ReqSpec can also be used to represent existing stakeholder and system requirement documents. Requirement documents represented in the Requirements Interchange Format can be imported into OSATE to migrate such documents into an architecture-centric virtual integration process. Finally, ReqSpec is an element of an architecture-led, incremental approach to system assurance. In this approach, requirements specifications are complemented with verification plans. When executed, these plans produce evidence that a system implementation satisfies the requirements. This report introduces the ReqSpec notation and illustrates its use on an example.
Download the report.

Architecture Fault Modeling and Analysis with the Error Model Annex, Version 2
By Peter H. Feiler, John J. Hudak, Julien Delange, David P. Gluch

Safety-critical software-reliant systems must manage component failures and conditions of anomalous interaction among components as hazards that affect a system's safety, reliability, and security so the potential effects of hazards on system operation are reduced to an acceptable risk. Standards and recommended practices for safety-critical systems outline methods for analysis, but security-related practices are typically addressed through separate guidance. This report provides guidance on using the Error Model Annex, Version 2 (EMV2), notation for architecture fault modeling and analysis, which supports automated safety, reliability, and security analyses from the same annotated architecture model to ensure consistency across analysis results. EMV2 augments architecture models expressed in the Architecture Analysis & Design Language with fault information to characterize anomalous conditions. The report introduces concepts for architecture fault modeling of systems in an operational environment at three levels of abstraction. In addition, EMV2 introduces the concept of error types to characterize exceptional conditions and their propagation. Finally, EMV2 allows users to specify which system components are expected to detect, report, and manage anomalous conditions and their propagation and to reflect the effects of recovery and repair actions as error behavior states. The report includes several example models.
Download the report.

Blacklist Ecosystem Analysis: 2016 Update
By Software Engineering Institute

This update, which is the latest in a series of regular updates, builds upon the analysis of blacklists presented in our 2013 and 2014 reports. In those reports, we established that the contents of blacklists generally fail to overlap substantially with each other. This report further corroborates that over-arching result. Our results suggest that available blacklists present an incomplete and fragmented picture of the malicious infrastructure on the Internet, and practitioners should be aware of that insight. This result also provides a starting point for further investigation to understand the dynamics of the blacklist ecosystem.

We have included 123 lists in our latest analysis, including 88 IP-address-based lists and 35 domain-name-based lists. The number of indicators included on any individual list varies from under 1,000 to over 50 million. Our analysis covers the 18-month period from July 1, 2014 to December 31, 2015.

In this report, we revisit three of the metrics considered in the 2014 report to characterize overlaps: reverse counts, list counts, and pairwise intersection counts. We have omitted the following metric to give the issue of following a more complete treatment in a future report. We have added two new metrics: a reverse lookup metric to capture counts of domains seen being resolved in passive DNS, and a persistence in blacklists metric that captures persistence of IPs on blacklists over long spans of time.

Most indicators appear on a single list. Our analysis revealed that 86.6% of IP address indicators appear on exactly one of the lists included in the study. For domain name indicators, 93.7 percent appear on a single list. Moreover, in the case of domain-name-based lists, there are two distinct "clusters" of lists: 13 of the lists (out of 35) are populated in such a way that fewer than half of the domain names listed are active, while 18 of the 35 are populated such that 80 percent or more of their entries do resolve.

Download the latest report.
Download the 2015 report.
Download the 2014 report.
Download the 2013 report.

Additional Resources

For the latest publications on SEI research, please visit https://resources.sei.cmu.edu/library/.

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed