Posted on by Architecturein
New data sources, ranging from diverse business transactions to social media, high-resolution sensors, and the Internet of Things, are creating a digital tidal wave of big data that must be captured, processed, integrated, analyzed, and archived. Big data systems storing and analyzing petabytes of data are becoming increasingly common in many application areas. These systems represent major, long-term investments requiring considerable financial commitments and massive scale software and system deployments.
With analysts estimating data storage growth at 30 to 60 percent per year, organizations must develop a long-term strategy to address the challenge of managing projects that analyze exponentially growing data sets with predictable, linear costs. This blog post describes a lightweight risk reduction approach called Lightweight Evaluation and Architecture Prototyping (for Big Data) we developed with fellow researchers at the SEI. The approach is based on principles drawn from proven architecture and technology analysis and evaluation techniques to help the Department of Defense (DoD) and other enterprises develop and evolve systems to manage big data.
The Challenges of Big Data
For the DoD, the challenges of big data are daunting. Military operations, intelligence analysis, logistics, and health care all represent big data applications with data growing at exponential rates and the need for scalable software solutions to sustain future operations. In 2012, the DoD announced a $250 million annual research and development investment in big data that is targeted at specific mission needs such as autonomous systems and decision support. For example, in Data-to-Decisions S&T Priority Initiative, the DoD has developed a roadmap through 2018 that identifies the need for distributed, multi-petabyte data stores that can underpin mission needs for scalable knowledge discovery, analytics, and distributed computations.
The following examples illustrate two different DoD data-intensive missions:
To address these big data challenges, a new generation of scalable data management technologies has emerged in the last five years. Relational database management systems, which provide strong data-consistency guarantees based on vertical scaling of compute and storage hardware, are being replaced by NoSQL (variously interpreted as "No SQL", or "Not Only SQL") data stores running on horizontally-scaled commodity hardware. These NoSQL databases achieve high scalability and performance using simpler data models, clusters of low-cost hardware, and mechanisms for relaxed data consistency that enhance performance and availability.
A challenging technology adoption problem is being created, however, by a complex and rapidly evolving landscape of non-standardized technologies built on radically different data models. Selecting a database technology that can best support mission needs for timely development, cost-effective delivery, and future growth is non-trivial. Using these new technologies to design and construct a massively scalable big data system creates an immense software architecture challenge for software architects and DoD program managers.
Why Scale Matters in Big Data Management
Scale has many implications for software architecture, and we describe two of them in this blog post. The first revolves around the fundamental changes that scale enforces on how we design software systems. The second is based upon economics, where small optimizations in resource usage at very large scales can lead to huge cost reductions in absolute terms. The following briefly explores these two issues:
Designing for scale. Big data systems are inherently distributed systems. Hence, software architects must explicitly deal with issues of partial failures, unpredictable communications latencies, concurrency, consistency, and replication in the system design. These issues are exacerbated as systems grow to utilize thousands of processing nodes and disks, geographically distributed across data centers. For example, the probability of failure of a hardware component increases with scale.
Studies such as this 2010 research paper, "Characterizing Cloud Computing Hardware Reliability," find that 8 percent of all servers in a data center experience a hardware problem annually, with the most common cause being disk failure. In addition, applications must deal with unpredictable communication latencies and network partitions due to link failures. These requirements mandate that scalable applications treat failures as common events that must be handled gracefully to ensure that the application operation is not interrupted. To address such requirements, resilient big data software architectures must:
Economics at scale. Big data applications employ many thousands of compute-and-storage resources. Regardless of whether these resources are capital purchases or resources hosted by a commercial cloud provider, they remain a major cost and hence a target for reduction. Straightforward resource reduction approaches (such as data compression) are common ways to reduce storage costs. Elasticity is another way that big data applications optimize resource usage, by dynamically deploying new servers to handle increases in load and releasing them as load decreases. Elastic solutions require servers that boot quickly and application-specific strategies for avoiding premature resource release.
Other strategies seek to optimize the performance of common tools and components to maintain productivity while decreasing resource utilization. For example, Facebook built HipHop, a PHP-to-C++ transformation engine that reduced its CPU load for serving web pages by 50 percent. At the scale of Facebook's deployment, this represents a very significant resource reduction and cost savings. Other targets for reduction are software license costs that can become cost prohibitive at scale. Cost reduction has seen a proliferation of database and middleware technologies developed recently by leading internet organizations, and many of these have been released as freely available open source. Netflix and Linkedin provide examples of powerful, scalable technologies for big data systems.
Other implications of scale for big data software architectures revolve around testing and fault diagnosis. Due to the deployment footprint of applications and the massive data sets they manage, it's impossible to create comprehensive test environments to validate software changes before deployment. Approaches such as canary testing and simian armies represent the state of the art in testing at scale. When the inevitable problems occur in production, rapid diagnosis can only be achieved by advanced monitoring and logging. In a large-scale system, log analysis itself can quickly become a big data problem as log sizes can easily reach hundreds of GBs per day. Logging solutions must include a low overhead, scalable infrastructure such as Blitz4J, and the ability to rapidly reconfigure a system to redirect requests away from faulty services.
Necessarily large investments and magnified risks that accompany the construction of massive, scalable data management and analysis systems exacerbate these challenges of scale. For this reason, software engineering approaches that explicitly address the fundamental issues of scale and new technologies are a prerequisite for project success.
Designing for Scalability with Big Data
To mitigate the risks associated with scale and technology, a systematic, iterative approach is needed to ensure that initial design models and database selections can support the long-term scalability and analysis needs of a big data application. A modest investment in upfront design can produce unprecedented returns on investment in terms of greatly reduced redesign, implementation, and operational costs over the long lifetime of a large-scale system.
Because the scale of the target system prevents the creation of full-fidelity prototypes, a well-structured software engineering approach is needed to frame the technical issues, identify the architecture decision criteria, and rapidly construct and execute relevant but focused prototypes. Without this structured approach, it is easy to fall into the trap of chasing after a deep understanding of the underlying technology instead of answering the key go/no-go questions about a particular candidate technology. Getting the right decisions for the minimum cost should be the aim of this exercise.
At the SEI, we have developed a lightweight risk reduction approach that we have initially named Lightweight Evaluation and Architecture Prototyping (for Big Data), or LEAP(4BD). Our approach is based on principles drawn from proven architecture and technology analysis and evaluation techniques such as the Quality Attribute Workshop and the Architecture Tradeoff Analysis Method. LEAP(4BD) leverages our extensive experience with architecture-based design and assessment and customizes these methods with deep knowledge and experience of the architectural and database technology issues most pertinent to big data systems. Working with an organization's key business and technical stakeholders, this approach involves the following general guidelines:
LEAP(4BD) provides a rigorous methodology for organizations to design enduring big data management systems that can scale and evolve to meet long-term requirements. The key benefits are:
We are currently piloting LEAP(4BD) with a federal agency. This project involves both scalable architecture design and focused NoSQL database technology benchmarking, as well as an assessment of features to meet the key quality attributes for scalable big data systems.
We are interested in working with organizational leaders who want to ensure appropriate technology selection and software architecture design for their big data systems. If you are interested in collaborating with us on this research, please leave us feedback in the comments section below or send an email to us at firstname.lastname@example.org.
To read the article "Time, clocks, and the ordering of events in a distributed system" by Leslie Lamport, please visit
To read the paper "Characterizing cloud computing hardware reliability," by Kashi Venkatesh Vishwanath and Nachiappan Nagappan, which was presented at the Proceedings of the First Association for Computing Machinery Symposium on Cloud Computing, please visit
To read more about the challenges of ultra-large-scale systems please visit