The 2014 Year in Review: Top 10 Blog Posts

In 2014, the SEI blog has experienced unprecedented growth, with visitors in record numbers learning more about our work in big data, secure coding for Android, malware analysis, Heartbleed, and V Models for Testing. In 2014 (through December 21), the SEI blog logged 129,000 visits, nearly double the entire 2013 yearly total of 66,757 visits.

As we look back on the last 12 months, this blog posting highlights our 10 most popular blog posts (based on the number of visits). As we did with our mid-year review, we will include links to additional related resources that readers might find of interest. When possible, we grouped posts by research area to make it easier for readers to learn about related areas of work. This blog post first presents the top 10 posts and then provides a deeper dive into each area of research.

Using V Model for Testing
Two Secure Coding Tools for Analyzing Android Apps (secure coding)
Common Testing Problems: Pitfalls to Prevent and Navigate
Four Principles of Engineering Scalable, Big Data Systems (big data)
A New Approach to Prioritizing Malware Analysis
Secure Coding for the Android Platform (secure coding)
A Generalized Model for Automated DevOps (DevOps)
Writing Effective Yara Signatures to Identify Malware
An Introduction to DevOps (DevOps)
The Importance of Software Architecture in Big Data Systems (big data)

1. Using V Model for Testing

Don Firesmith's post, Using V Models for Testing, which was published in November 2013, remains the most popular post. In the post, Firesmith introduces three variants on the traditional V model of system or software development that make it more useful to testers, quality engineers, and other stakeholders interested in the use of testing as a verification and validation method.

The V model builds on the traditional waterfall model of system or software development by emphasizing verification and validation. The V model takes the bottom half of the waterfall model and bends it upward into the form of a V, so that the activities on the right verify or validate the work products of the activity on the left.

More specifically, the left side of the V represents the analysis activities that decompose users' needs into small, manageable pieces, while the right side of the V shows the corresponding synthesis activities that aggregate (and test) these pieces into a system that meets users' needs.

The single V model modifies the nodes of the traditional V model to represent the executable work products to be tested rather than the activities used to produce them.
The double V model adds a second V to show the type of tests corresponding to each of these executable work products.
The triple V model adds a third V to illustrate the importance of verifying the tests to determine whether they contain defects that could stop or delay testing or lead to false positive or false negative test results.

In the triple-V model, it is not required or even advisable to wait until the right side of the V to perform testing. Unlike the traditional model, where tests may be developed but not executed until the code exists (i.e., the right side of the V), with executable requirements and architecture models, tests can now be executed on the left side of the V.

Readers interested in finding out more about Firesmith's work in this field, can view the following resources:

Book: Common System and Software Testing Pitfalls
Podcast: Three Variations on the V Model for System and Software Testing

2. Two Secure Coding Tools for Analyzing Android Apps (secure coding)
6. Secure Coding for the Android Platform (secure coding)

One of the most popular areas of research among SEI blog readers so far this year has been the series of posts highlighting our work on secure coding for the Android platform. Android is an important area to focus on, given its mobile device market dominance (82 percent of worldwide market share in the third quarter of 2013), the adoption of Android by the Department of Defense, and the emergence of popular massive open online courses on Android programming and security.

Since its publication in late April, the post Two Secure Coding Tools for Analyzing Android Apps, by Will Klieber and Lori Flynn, has been the second most popular post on our site. The post highlights a tool they developed, DidFail, that addresses a problem often seen in information flow analysis: the leakage of sensitive information from a sensitive source to a restricted sink (taint flow). Previous static analyzers for Android taint flow did not combine precise analysis within components with analysis of communication between Android components (intent flows). CERT's new tool analyzes taint flow for sets of Android apps, not only single apps.

DidFail is available to the public as a free download. Also available is a small test suite of apps that demonstrates the functionality that DidFail provides.

The second tool, which was developed for a limited audience and is not yet publicly available, addresses activity hijacking attacks, which occur when a malicious app receives a message (an intent) that was intended for another app, but not explicitly designated for it.

The post by Klieber and Flynn is the latest in a series detailing the CERT Secure Coding team's work on techniques and tools for analyzing code for mobile computing platforms.

In April, Flynn also wrote a post, Secure Coding for the Android Platform, the sixth most popular post in 2014. In that post, Flynn highlights secure coding rules and guidelines specific to the use of Java in the Android platform. Although the CERT Secure Coding Team has developed secure coding rules and guidelines for Java, prior to 2013 the team had not developed a set of secure coding rules that were specific to Java's application in the Android platform. Flynn's post discusses our initial set of Android rules and guidelines, which include mapping our existing Java secure coding rules and guidelines to Android and creating new Android-specific rules for Java secure coding.

Readers interested in finding out more about the CERT Secure Coding Team's work in secure coding for the Android platform can view the following additional resources:

Paper: Android Taint Flow Analysis for App Sets (SOAP 2014 workshop)
Presentation: Android Taint Flow Analysis for App Sets
Thesis: Precise Static Analysis of Taint Flow for Android Application Sets
CERT Secure Coding Rules and Guidelines: CERT Secure Coding Rules and Guidelines for Android wiki

3. Common Testing Problems: Pitfalls to Prevent and Navigate

A widely cited study for the National Institute of Standards & Technology (NIST) reports that inadequate testing methods and tools annually cost the U.S. economy between $22.2 billion and $59.5 billion, with roughly half of these costs borne by software developers in the form of extra testing and half by software users in the form of failure avoidance and mitigation efforts. The same study notes that between 25 percent and 90 percent of software development budgets are often spent on testing. In his series on testing, Don Firesmith highlights results of an analysis that documents problems that commonly occur during testing. Specifically, this series of posts identifies and describes 77 testing problems organized into 14 categories; lists potential symptoms by which each can be recognized, potential negative consequences, and potential causes; and makes recommendations for preventing them or mitigating their effects.

Here's an excerpt from the first post, Common Testing Problems: Pitfalls to Prevent and Navigate, which focused on general testing problems that are not specific to any type of testing, but apply to all different types of testing:

Clearly, there are major problems with the efficiency and effectiveness of testing as it is currently performed in practice. In the course of three decades of developing systems and software--as well my involvement in numerous independent technical assessments of development projects--I have identified and analyzed testing-related problems that other engineers, managers, and I have observed to commonly occur during testing. I also solicited feedback from various LinkedIn groups (such as Bug Free: Discussions in Software Testing, Software Testing and Quality Assurance) and the International Council on Systems Engineering (INCOSE). As of March 2013, I have received and incorporated feedback from 29 reviewers in 10 countries. While the resulting framework of problems can apply to both software and systems testing, it emphasizes software because that is where most of the testing problems occur.

Readers interested in finding out more about Firesmith's research in this area can view the following additional resources:

Presentation: Common Testing Problems: Pitfalls to Prevent and Mitigate, and the associated Checklist Including Symptoms and Recommendations, which were presented at the FAA Verification and Validation Summit 8 (2012) in Atlantic City, New Jersey on 10 October 2012.

4. Four Principles of Engineering Scalable, Big Data Systems (big data)
10. The Importance of Software Architecture in Big Data Systems (big data)

New data sources, ranging from diverse business transactions to social media, high-resolution sensors, and the Internet of Things, are creating a digital tsunami of big data that must be captured, processed, integrated, analyzed, and archived. Big data systems that store and analyze petabytes of data are becoming increasingly common in many application domains. These systems represent major, long-term investments, requiring considerable financial commitments and massive scale software and system deployments. With analysts estimating data storage growth at 30 percent to 60 percent per year, organizations must develop a long-term strategy to address the challenge of managing projects that analyze exponentially growing data sets with predictable, linear costs.

In his popular, ongoing Big Data series on the SEI blog, researcher Ian Gorton continues to describe the software engineering challenges of big data systems. His most popular post in this series, Four Principles of Engineering Scalable, Big Data Systems, offers four principles that hold for any scalable, big data systems. These principles can help architects continually validate major design decisions across development iterations, and hence provide a guide through the complex collection of design trade-offs all big data systems requires. Here's an excerpt:

In earlier posts on big data, I have written about how long-held design approaches for software systems simply don't work as we build larger, scalable big data systems. Examples of design factors that must be addressed for success at scale include the need to handle the ever-present failures that occur at scale, assure the necessary levels of availability and responsiveness, and devise optimizations that drive down costs. Of course, the required application functionality and engineering constraints, such as schedule and budgets, directly impact the manner in which these factors manifest themselves in any specific big data system.

In The Importance of Software Architecture in Big Data Systems, the 10th most popular post in 2014, Gorton continues to address the software engineering challenges of big data, by exploring how the nature of building highly scalable, long-lived big data applications influences iterative and incremental design approaches.

Readers interested in finding out more about Gorton's research in big data can also view the following additional resources:

Webinar: Software Architecture for Big Data Systems
Podcast: An Approach to Managing the Software Engineering Challenges of Big Data
Podcast: Four Principles for Engineering Scalable, Big Data Systems
Blog Post: In the blog post, Addressing the Software Engineering Challenges of Big Data, Gorton describes a risk reduction approach called Lightweight Evaluation and Architecture Prototyping (for Big Data) that he developed with fellow researchers at the SEI. The approach is based on principles drawn from proven architecture and technology analysis and evaluation techniques to help the Department of Defense (DoD) and other enterprises develop and evolve systems to manage big data.
Blog Post: In the blog post, Principles of Big Data Systems: You Can't Manage What You Don't Monitor, Gorton takes a deeper dive into one of the four challenges that he enumerated in his post, namely, you can't manage what you don't monitor.

5. A New Approach to Prioritizing Malware Analysis

Every day, analysts at major anti-virus companies and research organizations are inundated with new malware samples. From Flame to lesser-known strains, figures indicate that the number of malware samples released each day continues to rise. In 2011, malware authors unleashed approximately 70,000 new strains per day, according to figures reported by Eugene Kaspersky. The following year, McAfee reported that 100,000 new strains of malware were unleashed each day. An article published in the October 2013 issue of IEEE Spectrum, updated that figure to approximately 150,000 new malware strains. Not enough manpower exists to manually address the sheer volume of new malware samples that arrive daily in analysts' queues.

CERT researcher Jose Morales sought to develop an approach that would allow analysts to identify and focus first on the most destructive binary files. In his April 2014 blog post, A New Approach to Prioritizing Malware Analysis, Morales describes the results of research he conducted with fellow researchers at the SEI and CMU's Robotics Institute highlighting an analysis that demonstrates the validity (with 98 percent accuracy) of an approach that helps analysts distinguish between the malicious and benign nature of a binary file. This blog post is a follow up to his 2013 post Prioritizing Malware Analysis that describes the approach, which is based on the file's execution behavior.

Readers interested in learning more about prioritizing malware analysis, should listen to the following resource:

Podcast: Characterizing and Prioritizing Malicious Code

7. A Generalized Model for Automated DevOps (DevOps)
9. An Introduction to DevOps (DevOps)

In June, C. Aaron Cois wrote the blog post A Generalized Model for Automated DevOps where he presents a generalized model for automated DevOps and describes the significant potential advantages for a modern software development team.

With the post An Introduction to DevOps, Cois kicked off a series exploring various facets of DevOps from an internal perspective and his own experiences as a software engineering team lead.

Here's an excerpt from his initial post:

At Flickr, the video- and photo-sharing website, the live software platform is updated at least 10 times a day. Flickr accomplishes this through an automated testing cycle that includes comprehensive unit testing and integration testing at all levels of the software stack in a realistic staging environment. If the code passes, it is then tagged, released, built, and pushed into production. This type of lean organization, where software is delivered on a continuous basis, is exactly what the agile founders envisioned when crafting their manifesto: a nimble, stream-lined process for developing and deploying software into the hands of users while continuously integrating feedback and new requirements. A key to Flickr's prolific deployment is DevOps, a software development concept that literally and figuratively blends development and operations staff and tools in response to the increasing need for interoperability.

The following resources are available to readers interested in learning more about DevOps:

A New Blog Post Series: In November, Cois and other researchers in his group launched a new blog post series that offers guidelines and practical advice to organizations seeking to adopt DevOps. A new post is published every Thursday.
Podcast: DevOps--Transform Development and Operations for Fast, Secure Deployments

8. Writing Effective Yara Signatures to Identify Malware

In previous blog posts, David French has written about applying similarity measures to malicious code to identify related files and reduce analysis expense. Another way to observe similarity in malicious code is to leverage analyst insights by identifying files that possess some property in common with a particular file of interest. One way to do this is by using YARA, an open-source project that helps researchers identify and classify malware. YARA has gained enormous popularity in recent years as a way for malware researchers and network defenders to communicate their knowledge about malicious files, from identifiers for specific families to signatures capturing common tools, techniques, and procedures (TTPs). In this latest post, Writing Effective Yara Signatures to Identify Malware, which continues to draw a robust audience since its publication in 2012, he provides guidelines for using YARA effectively, focusing on selection of objective criteria derived from malware, the type of criteria most useful in identifying related malware (including strings, resources, and functions), and guidelines for creating YARA signatures using these criteria.

Here's an excerpt:

Reverse engineering is arguably the most expensive form of analysis to apply to malicious files. It is also the process by which the greatest insights can be made against a particular malicious file. Since analysis time is so expensive, however, we constantly seek ways to reduce this cost or to leverage the benefits beyond the initially analyzed file. When classifying and identifying malware, therefore, it is useful to group related files together to cut down on analysis time and leverage analysis of one file against many files. To express such relationships between files, we use the concept of a "malware family", which is loosely defined as "a set of files related by objective criteria derived from the files themselves." Using this definition, we can apply different criteria to different sets of files to form a family.

The following resource is available to readers interested in learning more about this work:

Research Report: Function Hashing for Malicious Code Analysis, CERT Research Report, pp 26-29

Wrapping Up 2014 and Looking Ahead

This has been a great year for the SEI Blog. We plan to take a break in publication for the remainder of 2014, but we will kick off 2015 with a series of great posts:

We will kick off a series of posts highlighting the SEI's technical strategy for 2015 and beyond.
Will Klieber and Lori Flynn will update their continuing research on secure coding for the Android platform.
Sagar Chaki and James Edmondson will detail their research on software model checking for verifying distributed algorithms.
Aaron Cois will continue his series on DevOps with posts on secure and continuous integration.

As always, we welcome your ideas for future posts and your feedback on those already published. Please leave feedback in the comments section below.

Additional Resources

Download the latest publications from SEI researchers at our digital library https://resources.sei.cmu.edu/library/.

Software Engineering Institute

SEI Blog

The 2014 Year in Review: Top 10 Blog Posts

Douglas Schmidt (Vanderbilt University)

December 22, 2014

PUBLISHED IN

CITE

TAGS

SHARE

Written By

Douglas Schmidt (Vanderbilt University)

Author Page

Digital Library Publications

Send a Message

More By The Author

The Latest Work from the SEI: an OpenAI Collaboration, Generative AI, and Zero Trust

April 10, 2024 • By Douglas Schmidt (Vanderbilt University)

Applying Large Language Models to DoD Software Acquisition: An Initial Experiment

April 1, 2024 • By Douglas Schmidt (Vanderbilt University), John E. Robert

10 Benefits and 10 Challenges of Applying Large Language Models to DoD Software Acquisition

January 22, 2024 • By John E. Robert, Douglas Schmidt (Vanderbilt University)

The Latest Work from the SEI

January 15, 2024 • By Douglas Schmidt (Vanderbilt University)

The Top 10 Blog Posts of 2023

January 8, 2024 • By Douglas Schmidt (Vanderbilt University)

More In Reverse Engineering for Malware Analysis

The Great Fuzzy Hashing Debate

April 22, 2024 • By Edward J. Schwartz

Comparing the Performance of Hashing Techniques for Similar Function Detection

April 15, 2024 • By Edward J. Schwartz

Detecting and Grouping Malware Using Section Hashes

June 5, 2023 • By Timur D. Snoke, Michael Jacobs

Two Tools for Malware Analysis and Reverse Engineering in Ghidra

November 1, 2021 • By Jeff Gennari

GhiHorn: Path Analysis in Ghidra Using SMT Solvers

October 18, 2021 • By Jeff Gennari