SEI Insights

DevOps Blog

Technical Guidelines and Practical Advice for DevOps

Fabric, Ansible, Docker, and Chaos Monkey: The DevOps Mid-Year Review

Posted on by in

In late 2014, the SEI blog introduced a biweekly series of blog posts offering guidelines, practical advice, and tutorials for organizations seeking to adopt DevOps. These posts are aimed at the ever-increasing number of organizations adopting DevOps (up 26 percent since 2011). According to recent research, those organizations ship code 30 times faster. Despite the obvious benefits of DevOps, many organizations hesitate to embrace DevOps, which requires a shifting mindset and cultural and technical requirements that prove challenging in siloed organizations. Given these barriers, posts by CERT researchers have focused on case studies of successful DevOps implementations at Amazon and Netflix, as well as tutorials on popular DevOps technologies such as Fabric, Ansible, and Docker. This post presents the 10 most popular DevOps posts (based on number of visits) over the last six months.


1. DevOps Technologies: Fabric or Ansible

In the blog post DevOps Technologies: Fabric or Ansible, CERT researcher Tim Palko highlights use cases associated with the DevOps deployment process, including evaluating resource requirements, designing a production system, provisioning and configuring production servers, and pushing code to name a few.

Here is an excerpt:

The workflow of deploying code is almost as old as code itself. There are many use cases associated with the deployment process, including evaluating resource requirements, designing a production system, provisioning and configuring production servers, and pushing code to name a few. In this blog post I focus on a use case for configuring a remote server with the packages and software necessary to execute your code. This use case is supported by many different and competing technologies, such as Chef, Puppet, Fabric, Ansible, Salt, and Foreman, which are just a few of which you are likely to have heard on the path to automation in DevOps. All these technologies have free offerings, leave you with scripts to commit to your repository, and get the job done. This post explores Fabric and Ansible in more depth. To learn more about other infrastructure-as-code solutions, check out Joe Yankel's blog post on Docker or my post on Vagrant.

One difference between Fabric and Ansible is that while Fabric will get you results in minutes, Ansible requires a bit more effort to understand. Ansible is generally much more powerful since it provides much deeper and more complex semantics for modeling multi-tier infrastructure, such as those with arrays of web and database hosts. From an operator's perspective, Fabric has a more literal and basic API and uses Python for authoring, while Ansible consumes YAML and provides a richness in its behavior (which I discuss later in this post). We'll walk through examples of both in this posting.

To read the complete post DevOps Technologies: Fabric or Ansible please visit
http://insights.sei.cmu.edu/post.cfm/devops-technologies-fabric-or-ansible.

2. DevOps and Docker
3. Development with Docker

Docker is quite the buzz in the DevOps community these days, and for good reason. Docker containers provide the tools to develop and deploy software applications in a controlled, isolated, flexible, and highly portable infrastructure. In the post DevOps and Docker, CERT researcher Joe Yankel introduces Docker as a tool to develop and deploy software applications with substantial benefits to scalability, resource efficiency, and resiliency.

Here is an excerpt:

Linux container technology (LXC), which provides the foundation that Docker is built upon, is not a new idea. LXC has been in the linux kernel since version 2.6.24, when Control Groups (or cgroups) were officially integrated. Cgroups were actually being used by Google as early as 2006, since Google has always been looking for ways to isolate resources running on shared hardware. In fact, Google acknowledges firing up over 2 billion containers a week and has released its own version of LXC containers called lmctfy, or "Let Me Contain That For You."

Unfortunately, none of this technology has been easy to adopt until Docker came along and simplified container technology, making it easier to utilize. Before Docker, developers had a hard time accessing, implementing, or even understanding LXC let alone its advantages over hypervisors. DotCloud founder and current Docker chief technology officer Solomon Hykes was on to something really big when he began the Docker project and released it to the world as open source in March 2013. Docker's ease of use is due to its high-level API and documentation, which enabled the DevOps community to dive in full force and create tutorials, official containerized applications, and many additional technologies. By lowering the barrier to entry for container technology, Docker has changed the way developers share, test, and deploy applications.

In the post Development with Docker, Yankel offers a tutorial on how to get started developing software with Docker in a common software development environment by launching a database container (MongoDB), a web service container (a Python Bottle app), and configuring them to communicate forming a functional multi-container application.

Here is an excerpt:

If you haven't learned the basics of Docker yet, you should go ahead and try out their official tutorial here before continuing.

To get started, you need to have a virtual machine or other host that is compatible with Docker. Follow the instructions below to create the source files necessary for the demo. For convenience, download all source files from our github repository and skip to the demo section. Our source contains a Vagrant configuration file that allows you to run the demo in an environment that will work. See our introductory post about Vagrant here.

To read the complete post, DevOps and Docker, please visit
http://insights.sei.cmu.edu/post.cfm/devops-docker-015.

To read the complete post Development with Docker, please visit
http://insights.sei.cmu.edu/post.cfm/development-with-docker.

4. DevOps Case Study: Amazon AWS

Regular readers of the DevOps blog will recognize a recurring theme in this series: DevOps is fundamentally about reinforcing desired quality attributes through carefully constructed organizational process, communication, and workflow. By studying well-known tech companies and their techniques for managing software engineering and sustainment, our series of posts can gain valuable real-world examples for software engineering approaches and associated outcomes. These case studies also serve as excellent case studies for DevOps practitioners. In the post DevOps Case Study: Amazon AWS, C. Aaron Cois explores Amazon's experience with DevOps.

Here is an excerpt:

Amazon is one of the most prolific tech companies today. Amazon transformed itself in 2006 from an online retailer to a tech giant and pioneer in the cloud space with the release of Amazon Web Services (AWS), a widely used on-demand Infrastructure as a Service (IaaS) offering. Amazon accepted a lot of risk with AWS. By developing one of the first massive public cloud services, they accepted that many of the challenges would be unknown, and many of the solutions unproven. To learn from Amazon's success we need to ask the right questions. What steps did Amazon take to minimize this inherently risky venture? How did Amazon engineers define their process to ensure quality?

Luckily, some insight into these questions was made available when Google engineer Steve Yegge (a former Amazon engineer) accidentally made public an internal memo outlining his impression of Google's failings (and Amazon's successes) at platform engineering. This memo (which Yegge has specifically allowed to remain online) outlines a specific decision that illustrates CEO Jeff Bezos's understanding of the underlying tenets of what we now call DevOps, as well as his dedication to what I will claim are the primary quality attributes of the AWS platform: interoperability, availability, reliability, and security.

To read the complete post DevOps Case Study: Amazon AWS, please visit
http://insights.sei.cmu.edu/post.cfm/devops-casestudy-amazon-aws-036.

5. DevOps Case Study: Netflix and the Chaos Monkey

While DevOps is often approached through practices such as Agile development, automation, and continuous delivery, the spirit of DevOps can be applied in many ways. In this blog post, C. Aaron Cois examines another seminal case study of DevOps thinking applied in a somewhat out-of-the-box way by Netflix.

Here is an excerpt:

Netflix is a fantastic case study for DevOps because their software-engineering process shows a fundamental understanding of DevOps thinking and a focus on quality attributes through automation-assisted process. Recall, DevOps practitioners espouse a driven focus on quality attributes to meet business needs, leveraging automated processes to achieve consistency and efficiency.

Netflix's streaming service is a large distributed system hosted on Amazon Web Services (AWS). Since there are so many components that have to work together to provide reliable video streams to customers across a wide range of devices, Netflix engineers needed to focus heavily on the quality attributes of reliability and robustness for both server- and client-side components. In short, they concluded that the only way to be comfortable handling failure is to constantly practice failing. To achieve the desired level of confidence and quality, in true DevOps style, Netflix engineers set about automating failure.

To read the complete post DevOps Case Study: Netflix and the Chaos Monkey, please visit
http://insights.sei.cmu.edu/post.cfm/devops-case-study-netflix-and-the-chaos-monkey.

6. DevOps and Agile

Melvin Conway, an eminent computer scientist and programmer, coined Conway's Law, which states: Organizations that design systems are constrained to produce designs which are copies of the communication structures of these organizations. Thus, a company with front-end, back-end, and database teams might lean heavily towards three-tier architectures. The structure of the application developed will be determined, in large part, by the communication structure of the organization developing it. In short, form is a product of communication.

In the post DevOps and Agile, C. Aaron Cois looks at the fundamental concept of Conway's Law applied to the organization itself.

Here is an excerpt:

The traditional-but-insufficient waterfall development process has defined a specific communication structure for our application: Developers hand off to the quality assurance (QA) team for testing, QA hands off to the operations (Ops) team for deployment. The communication defined by this non-Agile process reinforces our flawed organizational structures, uncovering another example of Conway's Law: Organizational structure is a product of process.

To read the complete post DevOps and Agile, please visit
https://blog.sei.cmu.edu/post.cfm/devops-agile-317.

7. ChatOps in the DevOps Team

Conversations between key stake holders of a project team (e.g., developers, business analyst, project manager, and security team) and the platform on which communication occurs can have a profound impact on that collaboration. Poor or unused communication tools lead to miscommunication, redundant efforts, or faulty implementations. On the other hand, communication tools integrated with the development and operational infrastructures can speed up the delivery of business value to the organization. How a team structures the infrastructure on which they communicate will directly impact their effectiveness as a team.

In the post ChatOps in the DevOps Team, CERT researcher Todd Waits introduces ChatOps, a branch of DevOps that focuses on communications within the DevOps team. The ChatOps space encompasses the communication and collaboration tools within the team: notifications, chat servers, bots, issue tracking systems, etc.

Here is an excerpt:

In a recent blog post, Eric Sigler writes that ChatOps, a term that originated at GitHub, is all about conversation-driven development. "By bringing your tools into your conversations and using a chat bot modified to work with key plugins and scripts, teams can automate tasks and collaborate, working better, cheaper and faster," Sigler writes.

Most teams have some level of collaboration on a chat server. The chat server can act as a town square for the broader development teams, facilitating cohesion and providing a space for team members to do everything from blowing off steam with gif parties to discussing potential solutions to real problems. We want all team members on the chat server. In our team, to filter out the noise of a general chat room, we also create dedicated rooms for each project where the project team members can talk about project details that do not involve the broader team.

More than a simple medium, the chat server can be made intelligent, passing notifications from the development infrastructure to the team, and executing commands back to the infrastructure from the team. Our chat server is the hub for notifications and quick interactions with our development infrastructure. Project teams are notified through the chat server (among other methods) of any build status they care to follow: build failures, build success, timeouts, etc.

To read the complete post ChatOps in the DevOps Team, please visit
http://insights.sei.cmu.edu/post.cfm/chatops-in-devops-team-029
.

8. DevOps Technologies: Vagrant

Environment parity is the ideal state where the various environments in which code is executed behave equivalently. The lack of environment parity is one of the more frustrating and tenacious aspects of software development. Deployments and development both fall victim to this pitfall too often, reducing stability, predictability, and productivity. When parity is not achieved, environments behave differently, which makes troubleshooting hard and can make collaboration seem impossible. This lack of parity is a burden for too many developers and operational staff.

In the blog post DevOps Technologies: Vagrant, CERT researcher Tim Palko describes Vagrant, which is a developer's tool that provides a virtualized and provisioned environment to developers using operations tools with a single, declarative script and a simple command-line interface. Vagrant increases development and environment parity by using the same preconfigured (scripted) environment across all developers or in production. Vagrant eliminates the "it works on machine" excuse in application development lifecycle

Here is an excerpt:

The job of an operations team often involves implementing full parity across deployment environments, such as those used for testing, staging, and production. Conversely, the development team is almost entirely responsible for provisioning development machines. To achieve 100 percent parity between both sets of environments, both teams must speak the same language and use the same resources.

Chef and Puppet, both crafted for the operations role, are just slightly out of reach for a busy developer. Each has a respectable learning curve, and neither really solves the parity problem completely: developers still need to virtualize the correct production target platform. All this additional work incurs a decent amount of overhead when you just want to write code!

This is where Vagrant comes in. Vagrant is a developer's tool that basically serves up a virtualized and provisioned environment to developers using operations tools with a single, declarative script and a simple command-line interface. Vagrant cuts out the grunt work needed to stand up a virtual machine (VM) and it removes the need to configure or run, for example, chef-server and chef-client. Vagrant hides all of this and leaves the developer with a simple script, an extensionless file named Vagrantfile, which can be checked into source control along with the code.

To read the complete post DevOps Technologies: Vagrant, please visit
https://blog.sei.cmu.edu/post.cfm/devops-technologies-vagrant-345.

9. Addressing the Detrimental Effects of Context Switching with DevOps

In a computing system, a context switch occurs when an operating system stores the state of an application thread before stopping the thread and restoring the state of a different (previously stopped) thread so its execution can resume. The overhead incurred by a context switch managing the process of storing and restoring state negatively impacts operating system and application performance. In the blog post Addressing the Detrimental Effects of Context Switching with DevOps, CERT researcher Todd Waits describes how DevOps ameliorates the negative impacts that "context switching" between projects can have on a software engineering team's performance.

Here is an excerpt:

In the book Quality Software Management: Systems Thinking, Gerald Weinberg discusses how the concept of context switching applies to an engineering team. From a human workforce perspective, context switching is the process of stopping work in one project and picking it back up after performing a different task on a different project. Just like computing systems, human team members often incur overhead when context switching between multiple projects.

Context switching most commonly occurs when team members are assigned to multiple projects. The rationale behind the practice of context switching is that it is logistically simpler to allocate team members across projects than trying to have dedicated resources on each project. It seems reasonable to assume that splitting a person's effort between two projects yields 50 percent effort on each project. Moreover, if a team member is dedicated to a single project, that team member will be idle if that project is waiting for something to occur, such as completing paperwork, reviews, etc.

Using our computing system metaphor, this switching between tasks is similar to the concept of multi-threading, where if one thread blocks the process for some reason, other threads can perform other work, rather than waiting for the first thread to unblock. If all work was assigned only to the first thread, progress is much slower. While multi-threading may be sound reasoning in computing systems, the problem is that human workers don't always get a nice 50-50 effort distribution. Effort is thus lost to context switching, and productivity may drop precipitously as the worker's effort is spread across more projects.

To read the complete post Addressing the Detrimental Effects of Context Switching with DevOps, please visit
http://insights.sei.cmu.edu/post.cfm/addressing-detrimental-effects-context-switching-devops-064.

10. What is DevOps?

Typically, when we envision DevOps implemented in an organization, we imagine a well-oiled machine that automates

  • infrastructure provisioning
  • code testing
  • application deployment

Ultimately, these practices are a result of applying DevOps methods and tools. DevOps works for all sizes, from a team of one to an enterprise organization. In the post, What is Devops, CERT researcher Todd Waits presents the foundations of DevOps.

DevOps can be seen as an extension of Agile methods. It requires all the knowledge and skills necessary to take a project from inception through sustainment to be contained within a dedicated project team. Organizational silos must be broken down. Only then can project risk be effectively mitigated.

Here is an excerpt:

While DevOps is not, strictly speaking, continuous integration, delivery, or deployment, DevOps practices do enable a team to achieve the level of coordination and understanding necessary to automate infrastructure, testing, and deployment. In particular, DevOps provides organizations a way to ensure

  • collaboration between project team roles
  • infrastructure as code
  • automation of tasks, processes, and workflows
  • monitoring of applications and infrastructure

Business value drives DevOps development. Without a DevOps mindset, organizations often find their operations, development, and testing teams working toward short-sighted incentives of creating their infrastructure, test suites, or product increment. Once an organization breaks down the silos and integrates these areas of expertise, it can focus on working together toward the common, fundamental goal of delivering business value.

Well-organized teams will find (or create) tools and techniques to enable DevOps practices in their organizations. Every organization is different and has different needs that must be met. The crux of DevOps, though, is not a killer tool or script, but a culture of collaboration and an ultimate commitment to deliver value.

To read the complete post What is DevOps, please visit
https://blog.sei.cmu.edu/post.cfm/what-is-devops-324.

Every two weeks, the SEI will publish a new blog post that offers guidelines and practical advice to organizations seeking to adopt DevOps in practice. We welcome your feedback on this series, as well as suggestions for future content. Please leave feedback in the comments section below.

Additional Resources

To view the webinar Culture Shock: Unlocking DevOps with Collaboration and Communication with Aaron Volkmann and Todd Waits please click here.

To view the webinar What DevOps is Not! with Hasan Yasar and C. Aaron Cois, please click here.

To listen to the podcast DevOps--Transform Development and Operations for Fast, Secure Deployments featuring Gene Kim and Julia Allen, please click here.

To read all of the blog posts in our DevOps series, please click here.

About the Author