Monitoring in the DevOps Pipeline
By Tim Palko
Senior Member of the Technical Staff
CERT Cyber Security Solutions Directorate
In the realm of DevOps, automation often takes the spotlight, but nothing is more ubiquitous than the monitoring. There is value to increased awareness during each stage of the delivery pipeline. However, perhaps more than any other aspect of DevOps, the act of monitoring raises the question, "Yes, but what do we monitor?" There are numerous aspects of a project you may want to keep an eye on and dozens of tools from which to choose. This blog post explores what DevOps monitoring means and how it can be applied effectively.
Before getting into the state of monitoring in DevOps, I want to take a minute to discuss tooling. Because there are so many products trying to promote monitoring, choosing among them can be distracting. It is best to, first, understand what holds business value to you and your customers. You should also recognize that not everything that can be monitored should be monitored. Discussing any particular tool isn't the aim of this post, but it is worth noting that the most vendors offer free trials and many other products are simply free, so it might be worth your time to sample a few after you have determined your monitoring strategy.
Infrastructure and service monitoring have been around long before DevOps, so how does DevOps really affect monitoring strategy, and is DevOps even needed for monitoring? Strangely, yes, in a way.
While monitoring predates DevOps, DevOps has furthered the software development process to such a degree that monitoring can't help but evolve as well. As a community, we are beyond writing cool application code; we are now writing cool infrastructure as code, automating integration and testing, and deploying everything in the cloud. Generally, the pace of development has increased, which imposes greater load on the customer feedback loop and deployment tooling. There is more to monitor, so where we use DevOps-style tooling to automate integration, testing, provisioning, and deployment, we need to use DevOps-style tooling to monitor our builds, resources, and performance.
Primary Categories of Monitoring Targets
Monitoring targets fall into several primary categories, and you will likely want to cover at least one aspect of each category. Broadly, these categories are application log output, server health, development milestones, vulnerabilities, deployments, and user activity. I will cover each briefly below:
- Development Milestones. Monitoring development milestones is a great way to gain insight into your actual process and how effectively your team is operating. This monitoring is really an indicator of how well your DevOps adoption strategy is working. Look at how often your sprint scope changes, the rate at which bugs are filed and fixed, and the ratio of promised-to-delivered features. Related to these metrics are the drivers that either encourage your team to work harder or demotivate and otherwise throw your schedule off the track. Many issue trackers have this monitoring built-in or provide an Agile plugin that helps to highlight this kind of data, so there is no need to add Yet Another Dependency to your project. But the reporting is sometimes easy to overlook and hard to interpret. There is a lot of data, and the charts do not answer obvious questions, so it is well worth starting with the questions you want to answer, such as What drivers caused us to miss 20 percent of our deadlines last year? and work backward from there.
- Vulnerabilities. Monitoring vulnerabilities comes in two parts: There are known vulnerabilities or weaknesses in application code, lists of which are maintained at locations such as the National Vulnerability Database (NVD), and then there are vulnerabilities induced in the top-level code of an application by insecure coding practices. These may look the same, but they differ in how they can be addressed (changing third-party dependencies vs. educating your development team, conducting regular code reviews, or hiring better skills) and how they can be identified (NVD queries vs. static code analysis). This is a large topic, outside the scope of this post, but not to be avoided.
- Deployments. Deployment monitoring is sometimes as straightforward as configuring your build servers to notify the team or a designated team member that something is wrong. These notifications are cheap (i.e., they are easy to set up), but very important, so it pays to have this process fail loudly. Chances are, if you are already using DevOps, you already have some monitoring built in to your process. Many continuous integration servers are notification-capable and can communicate with chat servers to alert teams of failed builds and deployments.
- Application log output. Application log output might be one of the most easily underestimated types of monitoring because, by its nature, running code already has output. Consequently, it is tempting to call it "done" on delivery. But, if your services are distributed and centralized logging isn't in place, you are not getting the full benefit. Moreover, errors and exceptions lose a lot of their value if they are not received in real-time. It is also worth making sure that any error-producing code generates notifications, and those notifications persist in a searchable format. The ability to trace an exception in a production environment to a commit tag is a great bonus.
- Server Health. Server health may be the most obvious type of monitoring. Here, of course, I am referring to uptime and performance with respect to available resources (as opposed to application code inefficiencies). Intrusion detection is also worth mentioning in this category, because your response team is likely the same for both a downed or over-utilized server as it is for a compromised server. Intrusion detection is not often a direct feature of popular health-monitoring tools, such as Nagios, or what you would get out of the box from a cloud provider, but it may make sense to have both intrusion detection and health monitoring systems on the same notification pipeline.
- Activity Monitoring. Lastly, and most innocuous, is user activity monitoring. Output here can be used to drive both feature development and the scaling of infrastructure. So, much like monitoring development milestones, it helps to approach this volume of data with prepared questions.
One final note about logging--whether about application logs, user activity monitoring, or simply project history: logging has even more value if the storage is centralized. Any problems with the application can be detected and analyzed in a global context, and if problems are well annotated, different log sources can be correlated to learn even more about the state of the application and the project.
Wrapping Up and Looking Ahead
While one area of monitoring may be more important to your business case than another, it is hard to make an argument for ignoring any one area. Some of these areas of monitoring are built into popular tools, such as issue trackers and integration servers, while others must be deliberately included. But, incorporating some minimal implementation of each kind of monitoring in your DevOps strategy will get you well on your way toward a more complete implementation of DevOps and a more stable and reliable infrastructure, product, and process.
The DevOps blog offers technical guidelines and practical advice for DevOps in practice. We welcome your feedback on this series, as well as suggestions for future content. Please leave feedback in the comments section below.
To view the webinar DevOps Security: Ignore It As Much As You Would Ignore Regular Security by Chris Taschner and Tim Palko, please click here.
To view the webinar Culture Shock: Unlocking DevOps with Collaboration and Communication with Aaron Volkmann and Todd Waits please click here.
To view the webinar What DevOps is Not! with Hasan Yasar and C. Aaron Cois, please click here.
To listen to the podcast DevOps--Transform Development and Operations for Fast, Secure Deployments featuring Gene Kim and Julia Allen, please click here.
To read all of the blog posts in our DevOps series, please click here.