Managing the Consequences of Technical Debt: 5 Stories from the Field
Rod Nord co-wrote this post.
If you participate in the development of software, the chances are good that you have experienced the consequences of technical debt, which communicates additional cost and rework over the software lifecycle when a short-term, easy solution is chosen instead of a better solution. Understanding and managing technical debt is an important goal for many organizations. Proactively managing technical debt promises to give organizations the ability to control the cost of change in a way that integrates technical decision making and software economics seamlessly with software engineering delivery. In this post, we provide real-world examples that illustrate the consequences of technical debt for organizations. These examples are excerpted from Chapter 1 of a book we wrote with our colleague Philippe Kruchten, Managing Technical Debt: Reducing Friction in Software Development, which has just been published by Addison Wesley as part of the SEI Series in Software Engineering.
Technical Debt A-B-C
Many practitioners today see technical debt as a somewhat evasive term to designate poor internal code quality. This understanding is only partly true. Technical debt may often have less to do with intrinsic code quality than with design strategy implemented over time. Technical debt may accrue at the level of overall system design or system architecture, even in systems with great code quality. It may also result from external events not under the control of the designers and implementers of the system.
In our work on technical debt, we define principles and practices for defining it, dissecting it, providing examples to study it from various angles, and suggesting techniques to manage it. Our definition of technical debt is as follows:
In software-intensive systems, technical debt consists of design or implementation constructs that are expedient in the short term but that set up a technical context that can make a future change more costly or impossible. Technical debt is a contingent liability whose impact is limited to internal system qualities--primarily, but not only, maintainability and evolvability.
This definition does not fall into the trap of considering only the financial metaphor implied by the term debt. Although the metaphor carries an interesting financial analogy, technical debt in software is not quite like a variable-rate mortgage or an auto loan. The debt begins and accumulates in development artifacts, such as design decisions and code. Technical debt also has a contingent aspect: How much technical debt you need to worry about depends on how you want your system to evolve.
System qualities, or quality attributes, are properties of a system used to indicate how well the system satisfies the needs of its stakeholders. The focus on internal quality is the lens through which these deficiencies are seen from the viewpoint of the cost of change. Technical debt makes the system less maintainable and more difficult to evolve.
Technical debt is not a new concept. It has plagued the industry ever since developers first produced valuable software that they did not plan to throw away or replace with new software, but instead wanted to evolve or simply maintain over time. The difference today is the increasing awareness that technical debt, if not managed well, will bankrupt the software-development industry. Practitioners today have no choice but to treat technical-debt management as one of the core software engineering practices.
While technical debt can have dire consequences, it is not always as ominous as it may sound. You can look at it as part of an overall investment strategy, a strategic software design choice. If you find yourself spending all your time dealing with debt or you reach the point where you cannot repay it, you have incurred bad debt. When you borrow or leverage time and effort that you can and will repay in the future, you may have incurred good debt. If the software product is successful, this strategy can provide you with greater returns than if you had remained debt free. In addition, you might also have the option to simply walk away from your debt if the software is not successful. This dual nature of technical debt--both good and bad--makes grappling with it confusing for many practitioners.
Examples of Technical Debt
To illustrate our definition, we offer a few brief examples of technical debt in software development projects. You will see organizations struggling with their technical debt and software development teams failing to strategize about it.
A company in Canada developed a good product for its local customers. Based on local success, the company decided to extend the market to the rest of Canada and immediately faced a new challenge: addressing the 20% of Canada that uses the French language in most aspects of life. The developers labored for a week to produce a French version of the product, planting a global flag for French = Yes or No as well as hundreds of if-then-else statements all over the code. A product demo went smoothly, and they got the sale!
Then, a month later, on a trip to Japan, a salesperson proudly boasted that the software was multilingual, returned to Canada with a potential order, and assumed that a Japanese version was only one week of work away. Now the decision not to use a more sophisticated strategy--such as externalizing all the text strings and using an internationalization package--was badly hurting the developers. They would not only have to select and implement a scalable and maintainable strategy but also have to undo all the quick-and-dirty if-then-else statements.
For the Canadian company, the decision to use if-then-else statements spread the change throughout the code, but it was a necessary quick-and-dirty solution from a business perspective to get a quick sale. Doing the right thing at that stage would have postponed the delivery of the system and likely lost them the deal. Now, would you continue down that path and add another layer of if-then-else for each language? Or would you rethink the strategy and decide to repay the original technical debt? Inserting the Japanese version of the quick fix, with its issues of character sets and vertical text, would be too much of a burden and a subsequent maintenance issue. You may argue that a good designer would have set up provisions for internationalization and localization right at the outset, but this is easy to say in hindsight. The demands and constraints at the beginning of development for this small venture were quite different, focused on the main features, and didn't foresee the need for a multilingual feature.
Hitting the Wall
Two large global financial institutions merged. As a result, two IT systems essential to their business had to merge. The management of the new company determined that a duct-tape and rubber-band system, mixing the two systems in some kind of chimera, would not work. They decided to build a support system from scratch, using more recent technologies and, in some ways, walking away from years of accumulated technical debt in the original systems.
The company organized a team to build the new replacement system. They progressed rapidly because the first major release was to provide an exact replacement of the existing systems. In a few months, they accumulated a lot of code that performed well in demos for each one-week sprint (or iteration). But nobody thought about the architecture of the system; everyone focused on creating more and more features for the demo. Finally, some harder issues of scalability, data management, distribution of the system, and security began to surface, and the team discovered that refactoring the mass of code already produced to address these issues was rapidly leading them to a complete stop. They hit the wall, as marathon runners would say. They had lots of code but no explicit architecture. In six months, the organization had accumulated a massive amount of technical debt that brought them to a standstill.
The situation here is very different from the first story. This problem was not an issue of code quality, but instead was an issue of foresight. The development team neglected to consider architectural and technology selection issues or learn from the two existing systems at appropriate times during development. The team did not need to do all of that up front, but it needed to do it early enough not to burden the project downstream. Refactoring is valuable, but it has limits. The development team had to throw away large portions of the existing code weeks after its original production. Although the organization hoped to eliminate technical debt when it decided to implement a brand-new system after the merger, it failed to incorporate eliminating technical debt into the project management strategy for the new system. Ignorance is bliss--but only for a while.
Crumbling Under the Load
A successful company in the maritime equipment industry successfully evolved its products for 16 years, in the process amassing 3 million lines of code. Over these 16 years, the company launched many different products, all under warranty or maintenance contracts; new technologies evolved; staff turned over; and new competitors entered the industry.
The company's products were hard to evolve. Small changes or additions led to large amounts of work in regression testing with the existing products, and much of the testing had to be done manually, over several days per release. Small changes often broke the code, for reasons unsuspected by the new members of the development team, because many of the design and program choices were not documented.
In the case of the maritime equipment company, there was no single cause of technical debt. There were hundreds of causes: code imperfections, tricks, and workarounds, compounded by no usable documentation and little automated testing. While the development team dreams of a complete rewrite, the economic situation does not allow delaying new releases or new products or abandoning support for older products. Some intermediate strategy must be implemented.
Death by a Thousand Cuts
One IT-service organization landed several major contracts. Some of this new business allowed the organization to grow its offshore development businesses and enter emerging software development markets. For several years, the organization experienced a hiring boom.
The IT-service projects were similar in nature, and the organization assumed that its new developers were interchangeable across projects. The project managers thought, "The task is customization of the same or similar software, so how different could it be?" In some cases, however, the new employees lacked the right skills or knowledge about the packages used. In other cases, time and revenue-growth pressures pushed them to skip testing the code thoroughly or fail to think through their designs. They also did not put in the time to create common application programming interfaces (APIs). The hiring boom created unstable teams, with new members introduced almost every month. It even became an internal joke: "Get a bunch of online Java and Microsoft certifications, and you are a senior developer here." In no time, the project managers lost control of the schedule, as well as the number of defects introduced into the system.
This IT-service organization provides another example in which there is no single source of technical debt. We call this situation "death by a thousand cuts" because a pervasive lack of competence can result in many small, avoidable coding issues that are never caught. Lack of organizational competency--as in the case of this IT-service organization--easily activates a number of cascading effects. The unplanned and unmanaged hiring boom, the missed opportunity to enforce commonality across the products, and the limited testing all contributed to the accumulating technical debt.
A five-person company developed a web application in the urban transportation domain, targeted at users of buses and trains. In this relatively new and rapidly evolving domain, the targeted users could not really tell the company what they would need. "I'll know it when I see it" was the general response. So, the company developed a minimum viable product (MVP) with some core functionality and little underlying sophistication. Members of the company beta-tested it with about 100 users in one city. They had to "pivot" several times until they found their niche, at which point they invested heavily in building the right infrastructure for a product that would be able to support millions of simultaneous users and adapt to dozens of situations and cities.
The initial shortcuts that members of this small company took--and the high-level rudimentary infrastructure they initially developed--are examples of technical debt wisely assumed. The company borrowed the time it would have spent on the complete definition and implementation of the infrastructure to deliver early. This strategy allowed it to complete an MVP months earlier than traditional development practices, which put the infrastructure first, would have allowed. Moreover, the company learned useful lessons about the key issues (which did not necessarily match its initial assumptions) of reliability, fault tolerance, adaptability, and portability. Building in these quality attributes up front would have created massive rework once the developers understood more completely what their users needed.
All along, members of this company were aware of the deliberate shortcuts they were taking and their consequences on future development. From the perspective of their angel investors, these were good strategies for risk management. If the company found no traction in the market, the developers could stop development early and minimize cost before the company made massive financial investments. Management also made it very clear to everyone, internal and external, that the shortcuts were temporary solutions so that no one would be tempted to keep them, painfully patched, as part of the permanent solution. In this manner, taking on technical debt was a wise investment that paid off. The company repaid the "borrowed time," but it could also have walked away from the project.
Resources for Practitioners
In all the examples listed above, the current state of the software contained code that worked, but it made further evolutions harder. The debt was induced by lack of foresight, time constraints, significant changes in requirements, or changes in the business context.
If any of these examples seem familiar and resonated with you, we encourage you to explore our book as well as other resources that we have developed in the past several years in our work with technical debt where we suggest how organizations may manage their technical debt and its consequences.
Managing Technical Debt: Reducing Friction in Software Development, which we coauthored along with Phillipe Kruchten, has just been published by Addison Wesley as part of the SEI Series in Software Engineering.
Learn more about the eLearning course Managing Technical Debt of Software.
Listen to our podcast, Managing Technical Debt: A Focus on Automation, Design, and Architecture.
Read other SEI blog posts about technical debt.
Learn about the Second International Conference on Technical Debt, Montreal, Canada, May 26-27, 2019, co-located with the International Conference on Software Engineering (ICSE) 2019.