SEI Insights

DevOps Blog

Technical Guidelines and Practical Advice for DevOps

Fabric, Ansible, Amazon AWS, and Netflix: The Top 10 DevOps Posts of 2017 (So Far)

Posted on by in

By Hasan Yasar
Technical Manager
Secure Lifecycle Solutions Group

In the first six months of 2017, an increasing number of blog visitors were drawn to posts highlighting topics such as secure Devops, successful DevOps implementations at Amazon and Netflix as well as tutorials on using DevOps technologies such as Fabric or Ansible. This post presents the 10 most popular DevOps posts in the first six months of 2017.

10. An Introduction to Secure DevOps: Including Security in the Software Lifecycle

The term "software security" often evokes negative feelings among software developers because it is associated with additional programming effort, uncertainty, and road blocks on fast development and release cycle. To secure software, developers must follow numerous guidelines that, while intended to satisfy some regulation or other, can be very restrictive and hard to understand. As a result, a lot of fear, uncertainty, and doubt can surround software security. This blog post, the first in a series, is based on a keynote Hasan Yasar, recently delivered at the International Conference on Availability, Reliability, and Security (ARES) 2016Hasan describe how include security into DevOps platform while maintain quality, reliability and pace of application deployment speed.

Here is an excerpt:

In the past, software security focused on the nature and origin of attacks, as well as measures for preventing attacks. However, most attacks-especially sophisticated attacks-can't be anticipated, which means that fixes are bolted on as new attacks are discovered.

The inability to anticipate attacks is why we often see patches coming out in response to new zero-day vulnerabilities. Secure DevOps developers would rather their software absorb the attacks and continue to function. In other words, it should bend but not break. This shift in thinking from a prevent to a bend-don't-break mindset allows for a lot more flexibility when it comes to dealing with attacks. Ensuring a secure lifecycle requires the development team to focus on continuous integration, infrastructure as code, continuous deployment, and automated integrated development platform.

Read the complete post.

9. Addressing the Detrimental Effects of Context Switching with DevOps

In a computing system, a context switch occurs when an operating system stores the state of an application thread before stopping the thread and restoring the state of a different (previously stopped) thread so its execution can resume. The overhead incurred by a context switch managing the process of storing and restoring state negatively impacts operating system and application performance. In this blog post, Todd Waits describes how DevOps ameliorates the negative impacts that "context switching" between projects can have on a software engineering team's performance.

Here is an excerpt:

In the book Quality Software Management: Systems Thinking, Gerald Weinberg discusses how the concept of context switching applies to an engineering team. From a human workforce perspective, context switching is the process of stopping work in one project and picking it back up after performing a different task on a different project. Just like computing systems, human team members often incur overhead when context switching between multiple projects.

Context switching most commonly occurs when team members are assigned to multiple projects. The rationale behind the practice of context switching is that it is logistically simpler to allocate team members across projects than trying to have dedicated resources on each project. It seems reasonable to assume that splitting a person's effort between two projects yields 50 percent effort on each project. Moreover, if a team member is dedicated to a single project, that team member will be idle if that project is waiting for something to occur, such as completing paperwork, reviews, etc.

Read the complete post.

8. Adding Security to Your DevOps Pipeline

DevOps practitioners often omit security testing when building their DevOps pipelines because security is often linked with slow-moving business units and outdated policies. These characteristics conflict with the overall goal of DevOps, which is to improve the software delivery process. However, security plays an important role in the software development lifecycle and must be addressed in all applications. Incorporating security into different stages of the DevOps pipeline will not only start to automate security, but will allow your security process to become traceable and easily repeatable. For instance, static code analysis could be done on each code commit, and penetration testing could be completed as part of the deployment phase. In this blog post, Kiriakos Kontostathis presents two common tools that can be used during deployment that allow for automated security tests: Gauntlt and OWASP Zed Attack Proxy (ZAP).

Gauntlt is a security testing tool that incorporates existing organizational tools that are used for security testing, i.e. nmap. This tool is built on the Cucumber framework and provides an easy to read language that gives users the ability to write their own attacks. Gauntlt also provides example attacks for most common attacks on its Github repo. A Gauntlt attack file consists of primarily two sections: the background and the scenarios. The background section ensures that the packages needed for the attacks are installed and that developers can connect to the host that is being tested for security. The scenario section is where the attacks, or tests, are actually performed. Operational staff can have an unlimited number of scenarios, but it is best practice to have a small number of them. That way each attack file will not get too broad.

I have prepared a short walkthrough that shows how to set up to run Gauntlt tests for your web application. I will be using the Gauntlt starter kit to set up the environment quickly. You will need Vagrant, VirtualBox, and Git installed on your local machine for this demo.

Read the complete post.

7. Whitebox Monitoring with Prometheus

In the ever-changing world of DevOps, where micro-services and distributed architectures are becoming the norm, the need to understand application internal state is growing rapidly. Whitebox monitoring gives you details about the internal state of your application, such as the total number of HTTP requests on your web server or the number of errors logged. In contrast, blackbox testing (e.g., Nagios) allows you to check a system or application (e.g., checking disk space, or pinging a host) to see if a host or service is alive, but does not help you understand how it may have gotten to the current state. Prometheus is an open source whitebox monitoring solution that uses a time-series database to provide scraping, querying, graphing and alerting based on time-series data. In this blog post, Joe Yankel briefly explores the benefits of using Prometheus as a whitebox monitoring tool.

Here is an excerpt:

Many of the organizations we work with use Nagios as their only monitoring tool for systems and applications. This it works well for its intended use, it does not provide complete details. Recently, in our test environment, Nagios alerted me that the disk space was 90 percent full. The disk space check worked as expected, but did not indicate why the disk was filling up. More importantly, it didn't indicate how much time I had until the system was no longer useable. Whitebox monitoring would allow me to see the rate at which the disk was filling, and this critical information could result in a good night's sleep for the system administrator if the trends imply the disk would be useable for the next 12 hours. The problem causing my application's disk to fill up was excessive logging caused by a lengthy database query. This discovery led to more discoveries, and I eventually narrowed the problem to two services we had running on separate machines.

Read the complete post

6. DevOps Case Study: Amazon AWS

Regular readers of the DevOps blog will recognize a recurring theme in this series: DevOps is fundamentally about reinforcing desired quality attributes through carefully constructed organizational process, communication, and workflow. By studying well-known tech companies and their techniques for managing software engineering and sustainment, DevOps blog authors have been able to present valuable real-world examples for software engineering approaches and associated outcomes. These examples also serve as excellent case studies for DevOps practitioners. In the post DevOps Case Study: Amazon AWS, C. Aaron Cois explores Amazon's experience with DevOps.

Here is an excerpt:

Amazon is one of the most prolific tech companies today. Amazon transformed itself in 2006 from an online retailer to a tech giant and pioneer in the cloud space with the release of Amazon Web Services (AWS), a widely used on-demand Infrastructure as a Service (IaaS) offering. Amazon accepted a lot of risk with AWS. By developing one of the first massive public cloud services, they accepted that many of the challenges would be unknown, and many of the solutions unproven. To learn from Amazon's success we need to ask the right questions. What steps did Amazon take to minimize this inherently risky venture? How did Amazon engineers define their process to ensure quality?

Luckily, some insight into these questions was made available when Google engineer Steve Yegge (a former Amazon engineer) accidentally made public an internal memo outlining his impression of Google's failings (and Amazon's successes) at platform engineering. This memo (which Yegge has specifically allowed to remain online) outlines a specific decision that illustrates CEO Jeff Bezos's understanding of the underlying tenets of what we now call DevOps, as well as his dedication to what I will claim are the primary quality attributes of the AWS platform: interoperability, availability, reliability, and security.

Read the complete post.

5. A DevOps a Day Keeps the Auditors Away (and Helps Organizations Stay in Compliance with Federal Regulations such as Sarbanes-Oxley)

In response to several corporate scandals, such as Enron, Worldcom, and Tyco, in the early 2000s congress enacted the Sarbanes-Oxley (SOX) act. The SOX act requires publicly traded companies to maintain a series of internal controls to assure their financial information is being reported properly to investors. In an IT organization, one of the main tenets of SOX compliance is making sure no single employee can unilaterally deploy a software code change into production. DevOps automation techniques and technologies, such as continuous integration (CI), continuous delivery (CD), and infrastructure as code (IaC), can appear on the surface to throw a shop out of SOX compliance. This blog post by Aaron Volkmann examines how DevOps automation can help organizations not only stay compliant, but actually increase their compliance.

Here is an excerpt:

When faced with transitioning software change control processes from a traditional manual approach to a more automated process many organizations fear that checks and balances will be lost and the organization will fall out of SOX compliance. In a traditional scenario shown in Figure 1 below, a developer makes a change to a piece of software, the code goes through a peer review process, and the code is then built into a deployable bundle either manually or by a build system. The new version is then submitted to a change control process for deployment into production. A manager then approves the change to production, and a production services engineer deploys the change. While there are many ways to manage this process, there is no guarantee that the version of code that was looked at by code review is the same version of code that was deployed into production.

Read the complete post.

4. Monitoring in the DevOps Pipeline

In the realm of DevOps, automation often takes the spotlight, but nothing is more ubiquitous than the monitoring. There is value to increased awareness during each stage of the delivery pipeline. However, perhaps more than any other aspect of DevOps, the act of monitoring raises the question, "Yes, but what do we monitor?" There are numerous aspects of a project you may want to keep an eye on and dozens of tools from which to choose. This blog post by Tim Palko explores what DevOps monitoring means and how it can be applied effectively.

Here is an excerpt:

Before getting into the state of monitoring in DevOps, I want to take a minute to discuss tooling. Because there are so many products trying to promote monitoring, choosing among them can be distracting. It is best to, first, understand what holds business value to you and your customers. You should also recognize that not everything that can be monitored should be monitored. Discussing any particular tool isn't the aim of this post, but it is worth noting that the most vendors offer free trials and many other products are simply free, so it might be worth your time to sample a few after you have determined your monitoring strategy.

Infrastructure and service monitoring have been around long before DevOps, so how does DevOps really affect monitoring strategy, and is DevOps even needed for monitoring? Strangely, yes, in a way.

Read the complete post.

3. DevOps Case Study: Netflix and the Chaos Monkey

While DevOps is often approached through practices such as Agile development, automation, and continuous delivery, the spirit of DevOps can be applied in many ways. In this blog post, C. Aaron Cois examines another seminal case study of DevOps thinking applied in a somewhat out-of-the-box way by Netflix.

Here is an excerpt:

Netflix is a fantastic case study for DevOps because their software-engineering process shows a fundamental understanding of DevOps thinking and a focus on quality attributes through automation-assisted process. Recall, DevOps practitioners espouse a driven focus on quality attributes to meet business needs, leveraging automated processes to achieve consistency and efficiency.

Netflix's streaming service is a large distributed system hosted on Amazon Web Services (AWS). Since there are so many components that have to work together to provide reliable video streams to customers across a wide range of devices, Netflix engineers needed to focus heavily on the quality attributes of reliability and robustness for both server- and client-side components. In short, they concluded that the only way to be comfortable handling failure is to constantly practice failing. To achieve the desired level of confidence and quality, in true DevOps style, Netflix engineers set about automating failure.

Read the complete post.

2. DevOps Technologies: Fabric or Ansible

In the blog post DevOps Technologies: Fabric or Ansible, CERT researcher Tim Palko highlights use cases associated with the DevOps deployment process, including evaluating resource requirements, designing a production system, provisioning and configuring production servers, and pushing code to name a few.

Here is an excerpt:

The workflow of deploying code is almost as old as code itself. There are many use cases associated with the deployment process, including evaluating resource requirements, designing a production system, provisioning and configuring production servers, and pushing code to name a few. In this blog post I focus on a use case for configuring a remote server with the packages and software necessary to execute your code. This use case is supported by many different and competing technologies, such as Chef, Puppet, Fabric, Ansible, Salt, and Foreman, which are just a few of which you are likely to have heard on the path to automation in DevOps. All these technologies have free offerings, leave you with scripts to commit to your repository, and get the job done. This post explores Fabric and Ansible in more depth. To learn more about other infrastructure-as-code solutions, check out Joe Yankel's blog post on Docker or my post on Vagrant.

One difference between Fabric and Ansible is that while Fabric will get you results in minutes, Ansible requires a bit more effort to understand. Ansible is generally much more powerful since it provides much deeper and more complex semantics for modeling multi-tier infrastructure, such as those with arrays of web and database hosts. From an operator's perspective, Fabric has a more literal and basic API and uses Python for authoring, while Ansible consumes YAML and provides a richness in its behavior (which I discuss later in this post). We'll walk through examples of both in this posting.

Read the complete post.

1. Continuous Integration in DevOps

When Agile software development models were first envisioned, a core tenet was to iterate more quickly on software changes and determine the correct path via exploration--essentially, striving to "fail fast" and iterate to correctness as a fundamental project goal. The reason for this process was a belief that developers lacked the necessary information to correctly define long-term project requirements at the onset of a project, due to an inadequate understanding of the customer and an inability to anticipate a customer's evolving needs. Recent research supports this reasoning by continuing to highlight disconnects between planning, design, and implementation in the software development lifecycle. In the blog post, Continuous Integration in DevOps, C. Aaron Cois highlights continuous integration to avoid disconnects and mitigate risk in software development projects.

Here is an excerpt:

A cornerstone of DevOps is continuous integration (CI), a technique designed and named by Grady Booch that continually merges source code updates from all developers on a team into a shared mainline. This continual merging prevents a developer's local copy of a software project from drifting too far afield as new code is added by others, avoiding catastrophic merge conflicts. In practice, CI involves a centralized server that continually pulls in all new source code changes as developers commit them and builds the software application from scratch, notifying the team of any failures in the process. If a failure is seen, the development team is expected to refocus and fix the build before making any additional code changes. While this may seem disruptive, in practice it focuses the development team on a singular stability metric: a working automated build of the software.

Recall that a fundamental component of a DevOps approach is that to remove disconnects in understanding and influence, organizations must embed and fully engage one or more appropriate experts within the development team to enforce a domain-centric perspective. To remove the disconnect between development and sustainment, DevOps practitioners include IT operations professionals in the development team from the beginning as full team members. Likewise, to ensure software quality, QA professionals must be team members throughout the project lifecycle. In other words, DevOps takes the principles of Agile and expands their scope, recognizing that ensuring high quality development requires continual engagement and feedback from a variety of technical experts, including QA and operations specialists.

Read the complete post.

Wrapping Up and Looking Ahead

In the remaining months of 2017, the DevOps blog will continue to publish guidelines and practical advice to organizations seeking to adopt DevOps in practice with upcoming posts on the following topics:

  • Security analysis of deployment scripts thought DevOps pipeline
  • How to choose right container to deploy your application
  • Secure DevOps assessment in Highly Regulated Environments(HRE)

We welcome your feedback on the DevOps blog as well as suggestions for future content. Please leave feedback in the comments section below.

Additional Resources

To view the webinar DevOps Panel Discussion featuring Kevin Fall, Hasan Yasar, and Joseph D. Yankel, please click here.

To view the webinar Culture Shock: Unlocking DevOps with Collaboration and Communication with Aaron Volkmann and Todd Waits please click here.

To view the webinar What DevOps is Not! with Hasan Yasar and C. Aaron Cois, please click here.

To listen to the podcast DevOps--Transform Development and Operations for Fast, Secure Deployments featuring Gene Kim and Julia Allen, please click here.

To read all of the blog posts in our DevOps series, please click here.

About the Author