Experiences Documenting and Remediating Enterprise Technical Debt
In our experience conducting architecture evaluations, the impact of technical debt often reaches beyond the scope of a single system or project. What we refer to as enterprise technical debt (ETD) debt consists of choices expedient in the short term, but often problematic over the long term. Because ignoring it can have significant consequences, architects should be alert for enterprise technical debt, and when they come across it, they should not let it get overlooked or ignored. In this post, I provide examples of enterprise technical debt and the risk it represents taken from real-world projects.
Below, I provide a case study in which we worked with an organization to implement our ETD management approach by creating a technical debt registry and dashboard in a Jira issue-tracking environment. I describe our journey managing ETD and share experiences to help readers deal with some of the challenges we faced along the way. This blog is organized into three sections:
- capturing enterprise technical debt descriptions
- storing and tracking enterprise technical debt items
- elaborating technical debt descriptions to motivate action
Capturing Enterprise Technical Debt Descriptions
Rough descriptions of ETD provide a good starting point, but more detail and structure are needed to determine the actions necessary to mitigate the debt. In this section, I will share practices useful for describing ETD at the right level of detail. I will also provide a template for organizing technical debt description information. The information in this section is adapted from the book Managing Technical Debt.
If you have never done it before, producing a technical debt description can be daunting. To get started, it’s helpful to adapt user stories to inform your descriptions. A user story might take this form:
As a <>, I want <> to <> so that <>.
For instance, consider a real-world situation we encountered in our role as architecture evaluators in which project requirements called for exchanging data between Applications A and B (a shared schema scenario). A user story for this situation might look like this:
As <Organization x>, we want to enable <Application teams A and B> to <make system changes independently> so that <Application teams A and B can deliver features more quickly>.
Now that you have something to work with, you just need a little more detail. One trick to developing detail is to enhance the user story by documenting the who, what, when, where, how, and why information (5Ws). The story should focus on what you would like the future to look like once debt is resolved. For example, the following paragraph presents the 5W version of the basic technical debt description above:
As chief architect (who), I would like to implement a new interoperability solution or design pattern such that when Applications team A makes a change, such as adding new user interface data element impacting the persistence layer (when), Application B is not impacted and vice versa (what). For example, a possible solution may involve creating an API to encapsulate the persistence layer (where), thereby insulating Application B from persistence layer changes (how). The benefit of this solution is that both Application A and B teams can deliver features more quickly because less coordination between teams will be required (why).
Now you have better detail, but the description you’ve created is cumbersome to read. To improve readability, we put the issue into the structured Technical Debt Item Template shown in Table 1 below. The template field names are listed in the left column, template field descriptions in the middle column, and the shared schema example pasted into the right column.
Table 1: Shared Schema, Technical Debt Item
Shared Schema Example
What is it? This field is a shorthand name for the technical debt item.
Tightly Coupled Shared Schema Integration
Where do you observe the technical debt in the affected development artifacts, and where do you expect it to accumulate?
The current implementation uses a shared database schema as a data exchange mechanism between Applications A and B. When shared schema changes are made without coordination, there can be unanticipated impact on Application A and/or B.
Why is it important to address this technical debt item? What is the consequence if this is not addressed?
Examples of consequences may include, but are not limited to:
· increasing or high costs due to the current state
· reduced productivity
· increased defects
· software quality issues (e.g., security, availability)
Tight coupling between applications and shared schema create potential for unintended impact when persistence layer changes are made. For example, a change in the schema may break the user interface or business logic of Application A, B, or both. The need for coordination of these changes slows down the pace at which the teams can implement features.
As a workaround, teams have copied data in their project environments and set up complex and error-prone electronic transfer and load (ETL) jobs to keep data synchronized. When the ETL jobs fail, occasionally data stores become inconsistent.
Describe the work needed to eliminate the debt, if any. When should the remediation occur to reduce or eliminate the consequences?
Replace shared schema solution with an application programming interface (API) for application data exchange. This insulates Applications A and B from persistent layer changes.
Assign a person or team. Who is responsible for servicing the debt? While in most cases the who aspect can be trivial, in some situations the debt resolution may need to be assigned to external parties. If remediation is significantly postponed, this field can communicate that decision.
The architecture evaluator is the reporter. Remediation assignees are Application A and Application B lead developers or the chief architect over both systems if one exists.
Description: Anchoring to System Elements
It is important to know exactly what part of the system you are talking about when you communicate or reason about a technical debt item. As the authors of Managing Technical Debt aptly explain, “To reason about technical debt, estimate its magnitude, and offer information on which to base decisions, you must be able to anchor technical debt to explicit technical debt items that identify parts of the system: code, design, test cases, or other artifacts.” For example, in the entry in Table 1, third column, under Description we see the words “shared database schema.” This is very specific and anchors to a specific artifact in the IT environment. We could improve this entry by naming the shared schema to eliminate confusion in the event there are multiple shared schemas in use.
Consequences: Be as Specific as Possible.
The Consequences field in the technical debt item (Table 1) is important because this information can be used to motivate remediation. For this reason, you should describe the consequences as crisply and specifically as possible. For example, in Table 1, column 3, in the Consequences field, we find the following entry: “Tight coupling between applications and shared schema create potential for unintended impact when persistence layer changes are made. For example, a change in the schema may break the user interface or business logic of Application A or B or both. The italicized portions highlight detailed consequence information. When documenting technical debt items, we recommend at least this level of detail and specificity for all Consequences entries.
Remediation: What to do if the Remediation is Undecided?
As soon as an ETD is discovered, we recommend entering it in the registry so it doesn’t get lost in the shuffle. The trouble is, at discovery time, potential remediation paths are often not yet defined. If this is the case, we advise you to complete the Remediation field with a notation such as “Analysis is pending to complete this section.” Such a notation will suffice for creating the initial technical debt item record, but don’t stop there. As soon as possible, gather relevant software engineers and architects to identify (and enter into the technical debt item template) some candidate remediation paths. It is very helpful to do this while the issue is fresh, because ETD items can take a long time to remediate and developers and/or management may change. You will need a good record of what was in the submitter’s head for future reference.
Storing and Tracking Enterprise Technical Debt Items
Now that you have captured the ETD item, what do you do with it? It is best practice to store technical debt items in a technical debt registry. This registry can take various forms. Here are two options we have encountered:
- Option 1, distributed technical debt registry. Use the backlog repositories you are currently using to manage work for storing technical debt items. If you choose this option, we recommend creating a type for technical debt items and tagging technical debt descriptions with a label, such as “techdebt,” because they may be stored with user stories, defects, and other tasks. With this option, for ETD that affects multiple projects, it may be necessary to create a second technical debt item in the other project repository. Since this duplication is not ideal, if you have a mechanism to create the technical debt item in one project repository and point to the other, this approach would be preferred. Options available to you depend on the allowable configuration options for your repositories in your organization.
- Option 2, centralized technical debt registry. Create a separate enterprise or cross-organizational repository for storing and tracking technical debt items. In this case, you can have a single technical debt item ticket and avoid duplication. For this reason, this is our preferred option. If you choose this option, if possible, we suggest linking tickets in the technical debt registry to tickets in the project-level backlog because this is often where mitigation changes will need to be made. This linking enables tracking of technical debt items through remediation completion.
When deciding which tools to use for the registry, it usually makes sense to use whatever tools your teams are familiar with. For example, an organization we are working with chose Option 2 above, so we designed and implemented Option 2 in Jira, which is the organization’s standard issue tracking tool. The organization chose Option 2 because it was concerned about technical debt items getting lost in its complicated web of backlog databases.
The centralized technical debt registry we created in Jira doesn’t just house technical debt tickets. It also houses Jira tickets from architecture evaluations. Consequently, to differentiate technical debt descriptions from other issues in the database, we added the label “technical debt item” to the technical debt Jira tickets. Due to challenges getting additional labels added in Jira within the organization, we do not yet have a separate enterprise technical debt label. So, the ETD differentiation is derived from written information in the technical debt description, such as which, or how many, systems or parties are impacted by the issue. Proper labeling and the ability to search for ETD items versus technical debt items would be a helpful improvement in the future. For now, teams are coached by the architecture evaluators to provide this level of detail in the Consequences field.
At this stage, you may be thinking, “So, the technical debt items (including ETDs) are in the technical debt registry. What happens next?” While there are good reasons not to pay down technical debt, let’s assume that an analysis has been done and, in this case, there is agreement that remediation would improve the situation. How do you motivate that remediation? Motivating action on ETDs can be a long, complicated process. (I’ll explain why in the following section.) If you can’t motivate action right away, try at least to keep these issues on the radar. To do this, you need easily accessible and current information about the ETD items so that you can track and report on status of technical debt. To do so, we created a Jira technical debt dashboard in the centralized technical debt registry that makes it easy for stakeholders to access up-to-date technical debt summary information. This also allows us to pull reports as needed when opportunities arise to discuss remediation with stakeholders of authority.
So, now we have a technical debt dashboard and reports. What do we do with them? You can use this data to solve problems, motivate action, or plan future technical debt mitigation. In the section below, we give examples of opportunistic usage; however, we hope to move in the direction of integrating ETD item review into the regular cadence of planning/investment activities over the coming year.
Elaborating Technical Debt Descriptions to Motivate Action
Now that we have ETDs in the technical debt registry and are reporting status on technical debt tickets, the next challenge is to pay down the technical debt. This is not as easy as it sounds. The pain of inaction from ETDs is usually not felt at the project level and, consequently, delays in paying them down are common. Success requires using solid ETD information to motivate the right people at the right time. It helps to have evidence that the cost is accumulating when you do this. However, while financial data is helpful, it’s not easy to get. So, we often settle for proxy metrics. The following paragraphs describe some of our experiences executing this process.
Continuing with our real-world shared schema example, it was clear that a stakeholder at a higher level of authority (above both Application A and B product owners) was needed to champion the remediation effort. In the absence of such a champion, the application product owners deferred the remediation. Following best practice, our team of architecture evaluators (along with contractor evaluators) documented the ETD item and stored it in the technical debt registry.
The first Consequences entry in the technical debt item template (see Table 1) entered by the architecture evaluator was adequate but not very motivating: “Tight coupling between applications and shared schema create potential for unintended impact when persistence layer changes are made. The need for coordination slows down the pace in which the teams can make changes.”
Over time, the situation got worse. A design review revealed that, “As a workaround, teams have copied data in their project environments and set up complex and error-prone electronic transfer and load (ETL) jobs to keep data synchronized. When the ETL jobs fail, occasionally data stores become inconsistent.” The stakes were getting higher, so the architecture evaluator updated the Consequences field with this additional information.
No action was taken until the architecture evaluator raised this ETD at an Application A release meeting attended by project stakeholders and the operations and maintenance (O&M) branch manager and staff. With the right people in attendance, and a worsening consequence anecdote, the remediation work was finally approved. This example illustrates how elaborating the Consequences field with detailed and specific information, as well as accumulating evidence, such as multiple project-level technical debt items pointing to root cause issues, can motivate action.
Another real-world example from our work as architecture evaluators concerned teams that had implemented duplicative authentication and access-control capability, creating an increased security and maintenance risk. In this case, the proposed remediation path required the cooperation of multiple parts of the organization. This included the IT department manager, the division IT director, and a portfolio project manager volunteer to pilot the effort. Due to lack of coordination and strongly motivating evidence, the organization made little progress on common access control for two years.
Meanwhile our architecture evaluators continued conducting project-level architecture reviews and project-level risks related to lack of shared common access control kept popping up. Each time the architecture evaluators captured these risks as independent technical debt tickets in the technical debt repository. The original ETD ticket in the technical debt registry was made a Jira epic to group the project-level tickets.
During investment planning for the upcoming year, the architecture evaluators asked whether the common access control ETD item could be considered. Multiple Jira tickets describing the impacts of heterogenous access control on the project ultimately provided enough evidence to convince executives to approve development of a common access control capability. This example illustrates how multiple tickets documenting the same root-cause issue can serve as evidence of accumulating “cost” that can be used to motivate action.
Looking Forward: Incorporating Enterprise Technical Debt into Planning
In an earlier SEI Blog post, I provided examples of ETD issues. In this post, I discussed our experiences documenting ETD and using that documentation as a motivator for remediation.
While ETD tickets in these examples were raised opportunistically, we are working toward more formally integrating ETD item reviews into the regular organizational investment planning cadence.