Category: Big Data

Have you ever been developing or acquiring a system and said to yourself, I can't be the first architect to design this type of system. How can I tap into the architecture knowledge that already exists in this domain? If so, you might be looking for a reference architecture. A reference architecture describes a family of similar systems and standardizes nomenclature, defines key solution elements and relationships among them, collects relevant solution patterns, and provides a framework to classify and compare. This blog posting, which is excerpted from the paper, A Reference Architecture for Big Data Systems in the National Security Domain, describes our work developing and applying a reference architecture for big data systems.

There are several risks specific to big data system development. Software architects developing any system--big data or otherwise--must address risks associated with cost, schedule, and quality. All of these risks are amplified in the context of big data. Architecting big data systems is challenging because the technology landscape is new and rapidly changing, and the quality attribute challenges, particularly for performance, are substantial. Some software architects manage these risks with architecture analysis, while others use prototyping. This blog post, which was largely derived from a paper I co-authored with Hong-Mei Chen and Serge Haziyev, Strategic Prototyping for Developing Big Data Systems, presents the Risk-Based Architecture-Centric Strategic Prototyping (RASP) model, which was developed to provide cost-effective systematic risk management in agile big data system development.

The crop of Top 10 SEI blog posts published in the first half of 2016 (judged by the number of visits by our readers) represents a cross section of the type of cutting-edge work that we do at the SEI: at-risk emerging technologies, cyber intelligence, big data, vehicle cybersecurity, and what ant colonies can teach us about securing the internet. In all, readers visited the SEI blog more than 52,000 times for the first six months of 2016. We appreciate your readership and invite you to submit ideas for future posts in the comments section below. In this post, we will list the Top 10 posts in descending order (#10 to #1) and then provide an excerpt from each post, as well as links to where readers can go for more information about the topics covered in the SEI blog.

A recent IDC forecast predicts that the big data technology and services market will realize "a 26.4 percent compound annual growth rate to $41.5 billion through 2018, or about six times the growth rate of the overall information technology market." In previous posts highlighting the SEI's research in big data, we explored some of the challenges related to the rapidly growing field, which include the need to make technology selections early in the architecture design process. We introduced an approach to help the Department of Defense (DoD) and other enterprises develop and evolve systems to manage big data. The approach, known as Lightweight Evaluation and Architecture Prototyping for Big Data (LEAP4BD) helps organizations reduce risk and streamline selection and acquisition of big data technologies. In this blog post, we describe how we used LEAP4BD to help the Interagency Project Office achieve their mission to integrate the IT systems of the Military Health System and the Veterans Health Administration.

By Kevin Fall
Deputy Director, Research, and CTO

This is the second installment in a series on the SEI's technical strategic plan.

Department of Defense (DoD) systems are becoming increasingly software reliant, at a time when concerns about cybersecurity are at an all-time high. Consequently, the DoD, and the government more broadly, is expending significantly more time, effort, and money in creating, securing, and maintaining software-reliant systems and networks. Our first post in this series provided an overview of the SEI's five-year technical strategic plan, which aims to equip the government with the best combination of thinking, technology, and methods to address its software and cybersecurity challenges. This blog post, the second in the series, looks at ongoing and new research we are undertaking to address key cybersecurity, software engineering and related acquisition issues faced by the government and DoD.

The Department of Defense (DoD) and other government agencies increasingly rely on software and networked software systems. As one of over 40 federally funded research and development centers sponsored by the United States government, Carnegie Mellon University's Software Engineering Institute (SEI) is working to help the government acquire, design, produce, and evolve software-reliant systems in an affordable and secure manner. The quality, safety, reliability, and security of software and the cyberspace it creates are major concerns for both embedded systems and enterprise systems employed for information processing tasks in health care, homeland security, intelligence, logistics, etc. Cybersecurity risks, a primary focus area of the SEI's CERT Division, regularly appear in news media and have resulted in policy action at the highest levels of the US government (See Report to the President: Immediate Opportunities for Strengthening the Nation's Cybersecurity ).

The term big data is a subject of much hype in both government and business today. Big data is variously the cause of all existing system problems and, simultaneously, the savior that will lead us to the innovative solutions and business insights of tomorrow. All this hype fuels predictions such as the one from IDC that the market for big data will reach $16.1 billion in 2014, growing six times faster than the overall information technology market, despite the fact that the "benefits of big data are not always clear today," according to IDC. From a software-engineering perspective, however, the challenges of big data are very clear, since they are driven by ever-increasing system scale and complexity. This blog post, a continuation of my last poston the four principles of building big data systems, describes how we must address one of these challenges, namely, you can't manage what you don't monitor.

In earlier posts on big data, I have written about how long-held design approaches for software systems simply don't work as we build larger, scalable big data systems. Examples of design factors that must be addressed for success at scale include the need to handle the ever-present failures that occur at scale, assure the necessary levels of availability and responsiveness, and devise optimizations that drive down costs. Of course, the required application functionality and engineering constraints, such as schedule and budgets, directly impact the manner in which these factors manifest themselves in any specific big data system. In this post, the latest in my ongoing series on big data, I step back from specifics and describe four general principles that hold for any scalable, big data system. These principles can help architects continually validate major design decisions across development iterations, and hence provide a guide through the complex collection of design trade-offs all big data systems require.

In the first half of this year, the SEI blog has experienced unprecedented growth, with visitors in record numbers learning more about our work in big data, secure coding for Android, malware analysis, Heartbleed, and V Models for Testing. In the first six months of 2014 (through June 20), the SEI blog has logged 60,240 visits, which is nearly comparable with the entire 2013 yearly total of 66,757 visits. As we reach the mid-year point, this blog posting takes a look back at our most popular areas of work (at least according to you, our readers) and highlights our most popular blog posts for the first half of 2014, as well as links to additional related resources that readers might find of interest.

Many types of software systems, including big data applications, lend them themselves to highly incremental and iterative development approaches. In essence, system requirements are addressed in small batches, enabling the delivery of functional releases of the system at the end of every increment, typically once a month. The advantages of this approach are many and varied. Perhaps foremost is the fact that it constantly forces the validation of requirements and designs before too much progress is made in inappropriate directions. Ambiguity and change in requirements, as well as uncertainty in design approaches, can be rapidly explored through working software systems, not simply models and documents. Necessary modifications can be carried out efficiently and cost-effectively through refactoring before code becomes too 'baked' and complex to easily change. This posting, the second in a series addressing the software engineering challenges of big data, explores how the nature of building highly scalable, long-lived big data applications influences iterative and incremental design approaches.

New data sources, ranging from diverse business transactions to social media, high-resolution sensors, and the Internet of Things, are creating a digital tidal wave of big data that must be captured, processed, integrated, analyzed, and archived. Big datasystems storing and analyzing petabytes of data are becoming increasingly common in many application areas. These systems represent major, long-term investments requiring considerable financial commitments and massive scale software and system deployments.