Can't Buy Me DevOps
The DevOps movement is clearly taking the IT world by storm. Technical feats, such as continuous integration (CI), comprehensive automated testing, and continuous delivery (CD) that at one time could only be mastered by hip, trendy startups incapable of failure, are now being successfully performed by traditional enterprises who have a long history of IT operations and are still relying on legacy technologies (the former type of enterprises are known in the DevOps community as "unicorns," the latter as "horses"). In this post, I explore the experience of a fictional horse, Derrick and Anderson (D&A) Lumber, Inc., a company that hit some bumps in the road on its way to DevOps. As D&A finds out, a DevOps transformation is not a product that can be purchased from the outside, but rather a competency that must be grown from within.
D&A is a retail company with 250 stores making a net profit of $210 million annually. D&A's IT operations have grown organically since their humble beginnings in the early 1980s, relying on a proprietary hardware and software vendor for their point-of-sale (POS) and inventory systems. Due to the long history of incremental upgrades from the vendor and hundreds of custom modifications and bolt-on programs developed by in-house development staff, D&A's systems were becoming increasingly hard to maintain. Deployment of software updates proved especially challenging due to a complex nightly batch schedule that synchronized data between the remote store locations and the central office.
As numerous organizations did in the early 2000s, D&A hired a fresh crop of engineers to develop and maintain the company's web presence. The website grew to become integrated with the proprietary legacy POS and inventory system. Many features that were added to the web site were jointly developed by the web and legacy backend development teams. The website would call custom programs running on the legacy system to exchange data between the two platforms.
Over the course of the next several years, adding new features to the website became painful due to the need to update programs running on the legacy system. Moreover, the website was becoming slower and slower as performance was constrained by calls to the legacy system. The marketing department was continually requesting new features, such as complex online ordering and a mobile offering. The web team found that it could only perform updates to the web site every six months.
Deployment at D&A typically took from the close of business Friday until Sunday afternoon because the team needed to allow the nightly batch processes to complete for Friday, back up the system, deploy the updates, and manually verify that the changes were successful. Any misstep caused a cascading effect in the batch processes, where missed days of data had to be manually loaded by IT support staff the following Monday. It was common for lingering issues from the upgrade to not be completely resolved until halfway through the next business week.
D&A needed to do something differently. The CIO caught on to the buzz in the industry about the DevOps movement and knew this was the change that D&A needed. The company hired a DevOps consulting firm to work with its web team to analyze the company's web application and implement a plan to improve its deployment. The consulting firm recommended purchasing and implementing a software package from a partner company to do CI and CD, which would improve D&A's software quality and help speed up deployment of its website. The consulting company helped D&A's operations team stand up the DevOps package and configured it to build the web application upon every check-in to the source code repository.
Several benefits occurred from these changes. For example, productivity was improved because the development team received feedback within minutes of committing bad code that broke the build. The DevOps firm also developed a process for automating deployment of the web application that allowed IT operations staff to deploy the built code to production with the press of a button.
Throughout the engagement with the DevOps consulting firm, the legacy team was too busy with a major software and hardware upgrade to participate in the project. This lack of involvement was brought to the attention of the CIO by the web team's technical manager, but it was mandated that the legacy team must not be distracted from completing the upgrade on schedule. Besides, the consultants only had expertise in Linux and Windows systems and no knowledge of D&A's proprietary legacy system.
Two months passed as the web and legacy teams developed and tested the next set of new website features that allowed customers to request a special coating on their lumber products. The new capability was already being talked about in trade magazines and touted by D&A's marketing department to the press. The time came to release the website updates along with the corresponding updates to the legacy system at retail store locations. The automated website deployment completed quickly and successfully as expected.
Deployment of the legacy system update, however, did not go as smoothly. About half of the remote locations' servers did not come back online after the update due to an operating system hotfix that was not applied consistently across all store locations. The systems that received the hotfix worked, but others hadn't hung because of a new system call introduced with the custom software update to support the website enhancement. These servers required manual intervention that took close to an entire day to remediate. These store locations were missed during the next day's batch schedule, and the usual Monday morning fire drill was on as VPs complained that their data warehouse reports looked wrong due to missing data for several locations.
The CIO ordered a detailed analysis of the incident. The company expected its investment in DevOps to speed up deployment of the website. The website was deployed quickly and smoothly, but the complex legacy backend systems remained a bottleneck to the whole process. D&A missed the opportunity to properly remediate its deployment problems when engaging the DevOps consulting firm. This inability resulted in hundreds of staff hours wasted in cleaning up the most recent upgrade fiasco that would have been better spent improving operations on the legacy side.
Over the next few months, the legacy and web teams got together and explored re-architecting the integrations between the legacy systems and the web front end. Both teams found that they could de-normalize their data by ascertaining on-hand quantities of products at the remote locations. They accomplished this by subtracting their sales from their orders and storing them in a relational database at the central office instead of taxing the legacy system at the stores. This new process was separate from day-to-day retail operations and could be updated at any time without risking the legacy system's availability. They also began to use a new Java-based service layer being offered by the legacy system vendor. This layer alleviated the need for the legacy team to deploy custom code every time a small change was needed to the required web interfaces.
The legacy team automated the process of deploying software updates to all their test and production systems to assure they had identical software on them. This automation assured that what's been tested in the test environment would work in production. Automation also helped them avoid large deployments that could take an entire weekend to complete It by allowing them to stagger the updates of legacy systems throughout the week without affecting production operations.
D&A learned that each technology area within the enterprise is unique in its constraints and capabilities. The website, which could be deployed quickly and easily through automation, was constrained by its dependence on the legacy backend system that was much slower to change. By hiring an outside consultant who focused on the lowest hanging fruit, their initial DevOps venture missed the mark.
The key lesson from this case study is that broad transformation of enterprise technology must come from within the organization, using the expertise and knowledge of internal subject matter experts who can navigate the maze of systems integrations often in place. These transformations can be time intensive, hard, and initially painful, so these initiatives can only succeed with senior leadership's full support and investment. The best kind of DevOps is not bought, but learned.
Every two weeks, the SEI will publish a new blog post offering guidelines and practical advice to organizations seeking to adopt DevOps in practice. We welcome your feedback on this series, as well as suggestions for future content. Please leave feedback in the comments section below.
To listen to the podcast, DevOps--Transform Development and Operations for Fast, Secure Deployments, featuring Gene Kim and Julia Allen, please visit http://url.sei.cmu.edu/js.