After the Cyber Resilience Review: A Targeted Improvement Plan for Service Continuity
In 2011, the SEI's CERT Division developed and published the Cyber Resilience Review (CRR) on behalf of the Department of Homeland Security. Since then, hundreds of CRRs have been conducted across all critical-infrastructure sectors, including financial services, healthcare and public health, energy, and water and wastewater systems. Each CRR provides an organization with a comprehensive report that can provide a seemingly overwhelming number of options for improving the resilience of the organization. In this post, we describe steps that organizations can take to use the results of a CRR to develop an actionable improvement plan. These steps are derived from a new technical note that we at the CERT Division have recently published, A Targeted Improvement Plan for Service Continuity.
After completing a CRR assessment, the next logical step is to use the results to create a targeted improvement plan. Yet the task of prioritizing the 167 different practices examined during a CRR can be daunting, and often the best advice on how to proceed is likely to be, "It depends." Organizations have different priorities, resources, and appetites for risk, and these factors among others determine how the organizations will choose to proceed. A recommended approach is to select a domain of practice and focus on improving those practices that support meeting the goals of that domain.
In a CRR, an organization is asked to examine the cyber resilience of a specific service that is critical to accomplishing the organization's mission. Cyber resilience is the ability of the organization to protect and sustain this critical, cyber-dependent service during times of stress, such as when the organization's systems are under cyber attack. The practices that constitute the service continuity management (SCM) domain ensure the continuity of essential operations of this critical service if a disruption occurs as a result of an incident, disaster, or other disruptive event.
An example of how a cyber attack can disrupt an organization's ability to provide critical services is the May, 2019 ransomware attack on the City of Baltimore, one of many recent ransomware attacks on government organizations. The Baltimore attack disrupted or disabled, among other services, city government email and voice mail, new home settlements and closings, online bill paying, billing and collection for water bills, and parks and recreation payments, permits, program registrations, and service requests.
Service Continuity Management (SCM)
In the CRR, we define SCM as a collection of policies, standards, processes, and tools through which organizations maintain their ability to accomplish their mission when there is significant impact to a critical service. The practices performed to support SCM support the organization's overall cyber resilience. A logical first priority for many organizations is to focus on SCM and as a result, improve its overall cyber resilience.
SCM transcends technology and encompasses business considerations. Effective SCM requires a strategy that is agreed upon at the most senior levels of the organization (e.g., at the board level if a board exists) and fully endorsed by the CEO. The strategy should define the direction and identify the resources and high-level methods that are necessary to meet specific service-level objectives. This enabling step is critical to the development of an SCM program and should be undertaken when plan development begins.
In developing the improvement-plan template presented and described in our technical note, we reviewed all the practices within the CRR and prioritized those that support service continuity. This resulted in a list of practices with recommended priorities that we call a service-continuity improvement template.
In addition to senior executives and board members, development of the targeted improvement plan engages the organization's business unit owners and management team, business-continuity professionals, information-technology operations-management staff, security-management team (physical and cyber), and in some cases operational technology team, all of whom ideally share the goal of creating a resilient organization. This approach to developing and implementing an SCM program supports organization-specific, mission-focused objectives to protect and sustain a critical, cyber-dependent service during times of stress.
Participation of organizational roles in development of the service-continuity plan is shown in Table 1 below.
Accounting for Risk Tolerance
Many small- and mid-sized organizations may have begun an SCM program only recently. Moreover, in some organizations, an SCM program may exist but may not integrate the cyber aspects necessary in today's business environment. Implementing an SCM program takes time and resources. A plan must be developed that takes into consideration the logical and efficient use of available resources and an organization's risk tolerance.
Risk tolerance is defined by an organization's senior management and reflects the impact senior management is willing to tolerate based on the likelihood that impact will occur. Implementing an SCM program will reflect an organization's risk tolerance in how the organization will allocate available resources and the priority placed on implementing the recommended practices. An organization may choose not to implement recommended practices based on its analysis of the risk that results from this decision. The key to resiliency is not perfection, but rather is the risk-informed balance between cost and protection in the achievement of the organization's mission.
How to Use the Service-Continuity Template
Our technical note explains how to use CRR results to achieve service continuity by prioritizing SCM-specific and supporting practices. The included template enables an organization to develop a long-term plan for protecting and sustaining critical, cyber-dependent services during times of stress. Use of the template assists in the development of a targeted improvement plan that is tailored to the organization's own priorities and takes advantage of the documented, current state of the organization as found in the CRR report.
An organization using this template can identify and prioritize practices that will most improve its SCM activities and support its overall cyber resilience. The template ranks each of the 167 practices based on the premise that limited resources demand difficult choices. The practices are not ranked according to importance--they are all important--but rather according to a suggested order grouped into implementation stages, with a higher priority recommended for those practices that enable service continuity. The targeted improvement plan pares down the CRR report of gaps to a list of improvement targets unique to the individual organization. The practitioner can then use this list to build the business case for improvement-funding requests and project planning.
The template groups each practice into one of four stages:
- Stage 1 - Essential to implementing a base SCM program; but implementing only Stage 1 practices is not considered sufficient for a complete program.
- Stage 2 - Additional practices required for completing the implementation of a SCM program. Stage 1 and 2 practices are considered the minimum for a complete implementation.
- Stage 3 - Practices that directly support the integration of the SCM program with the organization's cybersecurity-management program.
- Stage 4 - Practices that support the implementation of an organization's cybersecurity management program.
The SCM Improvement Template is ordered by domain to match the CRR assessment report. The column labeled Implementation Stage indicates the stage associated with that practice (Stage 1, 2, 3, 4).
Using the three additional columns (CRR Response, Gap Stage, and Priority), an organization can begin to develop its plan by following these steps:
Record the organization's current baseline (Yes, No, Incomplete) from the CRR report in the second column (CRR Response). In the third column (Gap Stage), record a C (complete) if the CRR Response is Yes; if the CRR Response is No or Incomplete, this indicates that there is a gap. For each gap, record the number of the Implementation Stage from column one. The result is a ranking of the gaps identified by the CRR, based on best practice. The final step is to compare each ranking to the organization's individual risk tolerance and risk appetite. Based on this risk analysis a recommended (1) may be modified to a 2nd or 3rd pass of improvement while the organization may decide a (3) should be a first priority. The fourth column exists to record the results of the analysis and will be decided based on the organization's risk tolerance and available resources; when completed, it will represent the prioritized targets for improvement.
When creating the improvement plan, remember that Stage 1 and 2 practices are required for implementing an SCM program. Practices that are fully implemented (indicated by a Yes response) during the CRR are not typically considered as candidates for improvement. A complete (C) in column 3 indicates a practice that is already fully implemented. Practices that have a 1 or 2 in column 3 (Gap Stage) fall below the recommended implementation state; these are gaps and would warrant consideration for immediate improvement. If resources do not allow taking on all gaps at once, Stage 1 practices should be considered a top priority. Consider the implementation stages indicated in the template as a recommended path for improvement. The organization can change the recommended implementation stage, making it lower or higher, based on its own risk analysis.
In this manner, your organization can develop a roadmap providing the recommended prioritization found by completing the template, informed by the results of the CRR, using the targeted improvement plan.
To access and download the complete template for developing a targeted improvement plan, see our technical note, A Targeted Improvement Plan for Service Continuity.
Read the SEI blog post, Cybersecurity Architecture, Part 1: Cyber Resilience and Critical Service.
Read the SEI blog post, Mapping the Cyber Resilience Review to the Financial Sector's Cybersecurity Assessment Tool.