Managing the Risks of Adopting AI Engineering
Adopting artificial intelligence (AI) technology in an organization introduces significant change. The organization will likely incur risk, a fact that has been recognized by both public-sector organizations, such as the United States Department of Defense (DoD), and private-sector organizations, such as McKinsey. Fortunately, risk can be controlled to limit the consequences to the organization.
In this blog post, a summary of the SEI white paper, A Risk Management Perspective for AI Engineering, I focus on some steps in adopting AI technology that organizations can take to explore and control the associated risks.
What Is Risk?
The CERT Resilience Management Model (CERT-RMM), the foundation for a process-improvement approach to operational resilience, defines the practices needed to manage operational resilience. Two definitions from CERT-RMM are relevant to this discussion of risk:
Risk: "The possibility of suffering harm or loss."
From a resilience perspective, risk is the combination of a threat and a vulnerability (condition), the impact (consequence) on the organization if the vulnerability is exploited, and the presence of uncertainty.
Condition: "A term that collectively describes a vulnerability, an actor, a motive, and an undesirable outcome."
A condition is essentially a threat that the organization must identify and analyze to determine if exploitation of the threat could result in undesirable consequences.
Risks can come to fruition and have impact if certain conditions exist, and some of these conditions might be interdependent.
Control the Conditions to Control the Risk
Organizations that adopt AI often encounter the following risk conditions:
- ill-defined problem statement
- lack of expertise
- model-system-data disconnection
- unrealistic expectations
- data challenges
- lack of verifiability
In the remainder of this post, I will discuss each of these conditions in turn and suggest steps that organizations can take to manage risks related to each of them.
Ill-Defined Problem Statement
Organizations must constantly adapt to shifting environmental conditions, including internal or external threat actors who change their tactics, changes in the value of assets, and a change in venue. These shifts can complicate the organization's definition of the problem that the AI system is designed to solve.
As in most environments, in a cybersecurity environment, vulnerabilities arise and technologies evolve. These changes--individually or in concert--demand nimble models that represent how those changes affect the organization; models that are sufficiently nimble to accommodate such changes require a form of data collection that strains the limits of current practice.
The organization must decompose each problem statement into smaller pieces and refine the requirements for addressing them. Theoretically, problem-statement decomposition can dilute risk exposure; a wrong decision affecting a small piece of the problem has less impact than a wrong decision affecting a problem of greater scope.
Lack of Expertise
An organization may be unable to assemble the expertise it needs to enable the proper use, selection, development, deployment, maintenance, testing, etc., of AI-related technology. Tasks such as defining the problem, developing the model, collecting data, and constructing systems require skills and expertise that may not be readily available.
Most organizations already have similar risks for other technical tasks for which the organization lacks the talent needed to realize its goals. The solutions to risk conditions such as these should be a proactive talent strategy: fostering educational opportunities, identifying opportunities that provide experience, and following rigorous hiring practices.
Some organizations might consider hiring consultants to fill a talent gap, but such a reactive approach can be costly and adds a supply-chain risk for services that also constrains the organization's degree of control.
Some customers may be uninformed or uneducated about AI technology and may not understand what AI technology can do. This situation can lead to unrealistic expectations. Customers of AI technology must be educated to understand that AI relies on mathematical modeling that enables automated, risk-based decisions. Since risk is probabilistic, there is always a chance of error. Customers and providers must understand and consider the consequences of errors occurring and determine if the impacts of those consequences fit into their risk appetite.
The design of an AI system must emulate the conditions of its environment, and the system must be fed the appropriate data to operate as expected. For example, suppose an organization develops an AI system that makes risk-based decisions from data gathered from sensors fitted in the organization's network. Such a system must be able to sense, collect, and compute the needed data to make the decision. A disconnection of model, system, or data can result in a system that doesn't meet its requirements and an event with an undesired consequence, such as an AI system that produces poor decisions.
When developing an AI system, organizations must have a proactive and disciplined process for exploring requirements and securing development operations with a flexible and nimble software architecture. Agile software development is an example of how developers can build a system with the flexibility to adapt continually to changing conditions while maintaining model, system, and data alignment.
Ensuring that the information being used for the AI system is sound relies on an assumption that the data used by the system is both relevant and accurate.
Data relevance pertains to how applicable the data being used is to the model being used to deliver the desired information. Data relevance also depends on the model, system, and data being in alignment. The model may experience concept drift, meaning that the real-world conditions being modeled shift in a way that invalidates the model. That same drift can happen with the data collected for the model (e.g., sensor fidelity might be inadequate or sensors might provide too much noise and need significant grooming).
Although mitigating risks related to data relevance is challenging and possibly costly, organizations must try to revisit the goals of the system and how well those goals are served with the current dataset and AI solution. Regular reviews are common in the design and development stages in the form of engineering design reviews. It is therefore critical to maintain that tempo of review with relevant stakeholders once the model is employed in the field.
Data accuracy pertains to the correctness of the data. The accuracy of data depends on many factors, including how it is collected, the fidelity of the sensors collecting it, and the environment from which the data is collected. Data is subject to being intentionally or unintentionally poisoned. Other threats to data maintenance include biased interpretation of data, faulty data collection, and a low volume of data. Organizations must take steps to limit the likelihood of these threats. Again, this process may be rigorous and expensive. Data analysis and checking on system performance should not end once the design, development, and testing phases of the system lifecycle occur. Regular performance review and sensitivity analysis based on sensor and system settings could improve the fidelity of results.
Lack of Verifiability
Depending on the model and the data used by the system to make decisions, it may be a challenge for users to verify the results of the system. Users must confirm that the risk-based decisions made by the AI system are appropriate. Without such verification, users could perceive potential bias and lose trust in the system or even in AI technology.
Results may be hard to verify for various reasons, including the following:
- Interpretability of the results may be as important as knowing what results to expect. People of-ten have trouble processing complex problems and may rely on AI technology for good information.
- It is hard for the organization to modify and tune the AI model when errors are identified.
- The organization's risk appetite may not be able to tolerate model corrections and provide the sought-after benefits without damaging stakeholders' trust in AI.
To reduce the risk of abandoning AI technology, the organization should provide education about AI, temper organizational expectations, and define the scope of where AI technology will be applied. An effective adoption strategy begins with small problems that have limited potential negative impacts.
Challenges of verifiability characterize risks that may already have been identified for other new and emergent technologies. Once AI verification risks are identified, risk managers can apply existing mitigations to the AI risks.
Dealing with the Consequences
Given the current state of AI technology, adopting it is subject to the possibility that errors may occur. Models can break down, irrational actors can act, data can be corrupted, and conditions can shift. When such events occur, organizations should plan for a measured response to mitigate the impacts.
The risks of adopting AI technology can differ from the risks of adopting other new technology or innovations. AI can be granted the power to take significant actions without the knowledge or release authority of the organization. The likely consequence of risk events, therefore, is that the AI system will deliver unacceptable decisions or actions.
When analyzing the potential consequences of these risks, the impacts all tend to be the same type, regardless of the risk conditions that brought about the consequences. However, the magnitude of the consequences may vary. For example, data challenges, model breakdown, and lack of talent can all lead to bad decisions. The risk manager must consider all the possible risk conditions that can lead to the negative consequence and apply reactive planning--a plan for how to react when something occurs--to all of the risks related to AI technology. Successful risk managers must maintain a broad perspective that analyzes how these risks apply across the organization.
Risk managers must conduct a business impact analysis (BIA) to learn the extent of the pain experienced when a technology fails. The similarity of AI-related risks to other new-technology risks may provide helpful BIA information. This information is critical, especially when viewed through the lens of the organization's risk appetite. The organization can also use this BIA information to tune its risk tolerance, making it easier to decide where and how to apply AI technology.
Early-adopter organizations that seek to adopt AI should evaluate scenarios where confidentiality, integrity, or availability (CIA) are lost. Previous risk analyses related to the services and assets for which AI will be used will help risk managers understand what to do if a catastrophe occurs. In the end, an organization can take measures to introduce AI systematically in a way that limits risk exposure while adopting the new technology.
Wrapping Up and Looking Ahead
AI technology continues to evolve, and organizations have an increased appetite for automated solutions to the challenges they face. AI may eventually satisfy that appetite and deliver systems that make risk-based decisions in a cybersecurity context. Until then, organizations must identify the uncertainties related to AI technology and understand AI benefits and the ramifications of AI-technology failures.
Initial steps for adopting AI technology should include the following:
- Establish a standardized risk-management policy and procedures for implementing that policy, such as the CERT Resilience Management Model (CERT-RMM) Version 1.2. This approach ensures consistency when adopting new technologies amid the related uncertainties.
- Establish a governance structure where risk-based decisions, such as adopting new technologies, can be made. If a risk-governance structure is not yet established, the organization may opt to use other decision-making bodies such as a technology council.
- Have the organization's risk program work with executives to understand and communicate the willingness of the organization to take risks so that a reasonable risk appetite bounds the scope of decisions.
Proactively examining how to control the risk conditions related to adopting AI can help organizations introduce AI technology while minimizing exposure to risks. To help organizations with enterprise risk management, the SEI CERT Division is evolving the OCTAVE model. OCTAVE Allegro helps risk managers identify, analyze, and prioritize information-security-related risks. OCTAVE FORTE is the next evolution of OCTAVE that identifies all types of risk experienced across the organization in a way that resonates with executives. OCTAVE FORTE is a process model that organizations can use to identify and mitigate risks. Its steps include establishing risk governance, appetite, and policy; identifying risks, threats, and vulnerabilities; and forming and implementing an improvement plan. FORTE provides an approach for strategically introducing new technologies and prioritizing related risks. The SEI will publish OCTAVE FORTE in 2020.
Read the SEI white paper, AI Engineering: 11 Foundational Practices.
Read the SEI white paper, A Risk Management Perspective for AI Engineering.
Read the SEI blog post, Three Risks in Building Machine Learning Systems.
Read the SEI blog post, Detecting Mismatches in Machine-Learning Systems.
Learn about the CERT Resilience Management Model (CERT-RMM) Version 1.2.
Learn about the OCTAVE model.
Learn about OCTAVE Allegro.
View the SEI podcast, Security Risk Assessment Using OCTAVE Allegro.
Learn about the SEI course, Assessing Information Security Risk Using the OCTAVE Approach.
Learn about OCTAVE FORTE.