search menu icon-carat-right cmu-wordmark

Machine Learning and Insider Threat

Daniel Costa

As organizations' critical assets have become digitized and access to information has increased, the nature and severity of threats has changed. Organizations' own personnel--insiders--now have greater ability than ever before to misuse their access to critical organizational assets. Insiders know where critical assets are, what is important, and what is valuable. Their organizations have given them authorized access to these assets and the means to compromise the confidentiality, availability, or integrity of data. As organizations rely on cyber systems to support critical missions, a malicious insider who is trying to harm an organization can do so through, for example, sabotaging a critical IT system or stealing intellectual property to benefit a new employer or a competitor. Government and industry organizations are responding to this change in the threat landscape and are increasingly aware of the escalating risks. CERT has been a widely acknowledged leader in insider threat since it began investigating the problem in 2001. The CERT Guide to Insider Threat was inducted in 2016 into the Palo Alto Networks Cybersecurity Canon, illustrating its value in helping organizations understand the risks that their own employees pose to critical assets. This blog post describes the challenge of insider threats, approaches to detection, and how machine learning-enabled software helps provide protection against this risk.

In response to several high-profile insider incidents in government, in 2011 President Obama signed Executive Order 13587, which mandated all government agencies that have access to classified information to stand up formal insider threat programs. Around that executive order, a National Insider Threat Task Force was formed to establish standards for insider-threat programs. CERT worked with this task force to help establish minimum requirements and capabilities. CERT also published guidance and disseminated it to organizations that are mandated to build these programs, to help them meet requirements. Moreover, CERT has applied significant machine learning expertise to the insider threat problem.

The CERT Insider Threat Center collects and analyzes data about security breaches perpetrated by insiders. The center's data represent a rich body of information about

  • the kinds of attacks that insiders carry out
  • activities that take place before an attack occurs that could indicate the potential for an incident, and
  • vulnerabilities in organizations' systems that insiders exploit in their attacks.

By tracking this information, CERT helps organizations instrument their systems and change their practices to lower both the frequency and the severity of insider attacks and preserve the integrity of their most critical information.

How to Protect Critical Assets from Attacks by Insiders

Data sources for insider threat often include cyber observables: things that people can be observed doing on their computer. For example, increased use of web-based email or a cloud-based data-storage application might indicate an intention to illegally distribute intellectual property that belongs to the organization, providing cyber-observable evidence of an attack. Other cyber observables include logon/logoff time, files accessed on a network, building card swipes, and organization email usage, among others. When cyber observables coincide with behavioral indicators, such as poor or erratic job performance, expressed disgruntlement about being passed over for promotions, etc., a profile may emerge of an insider who could be planning to harm the organization. CERT's goal is to raise awareness of these kinds of observable behaviors both within and outside of the cyber realm so that organizations can prevent and/or mitigate the effects of these attacks.

Insider threat detection involves collecting and analyzing data from disparate technical and non-technical sources. For example, in some organizations there may be information from different parts of the organization--human resources (HR), legal, information technology (IT), security--that if aggregated would reveal patterns of behavior that might be of concern. But if this information is not aggregated--if HR does not talk to IT, for example, and a person leaving the organization abruptly does not have their account restricted, audited, or terminated in a timely fashion, the organization could be at risk for an insider attack. Organizations need to have a consistent and well-understood insider-threat strategy that cuts across all of these organizational stovepipes. With a formal insider-threat strategy in place, there is a central repository for information sharing and in some cases information aggregation, enabling information that is usually stovepiped to be aggregated, shared, and mined to provide a full picture of risks.

Applying Machine Learning to the Insider-Threat Problem

These disparate datasets pose an enormous challenge to human analysts searching for insider threats. Clues towards an employee's malicious intentions may be spread across multiple datasets, hidden among tens or hundreds of thousands of other data points, or separated by weeks or months of inactivity.

Machines, on the other hand, excel at these types of subtle pattern detection across large datasets. Specialized algorithms can be designed to look for anomalies, such as deviations from normal computing behavior or violations of defined policies and procedures, or even just activities that don't match the behavior of other employees. When enough of these indicators co-occur in a single employee, an organization has significant reason for concern. For example, a change in the amount of network traffic that an employee is generating over time can indicate malicious intent, but it can also be a natural result of moving to a new project or using new software. There is more reason for concern, however, when these changes are combined with other indicators, such as use of web-based email for business purposes when there is a policy that it is for personal use only, or highly unusual login-logout times. Putting anomaly-detection methods in place through the use of machines can help organizations differentiate routine personal-computer use from something nefarious.

Data-loss-prevention and network-monitoring tools can prevent certain kinds of information from being distributed outside of organizational firewalls or raise alerts when they detect anomalous network activity, such as unusually large downloads. These systems establish baselines for normal behavior and look for deviations. Machines are also particularly well suited to aggregate the outputs of such tools with, for example, facility-access records, travel records, performance-management records, and textual information from human resources, and then to mine all of that data looking for patterns that might indicate the presence of a malicious insider. When machines flag such a pattern, they can alert insider-threat analysts to take a closer look at certain individuals and to follow up by consulting additional data sources for more information.

The challenge in the use of such tools is that they can be noisy and can generate tens of thousands of alerts with high false-positive rates. Analysts need to know how best to use the output of machines and how to interpret the data that these systems present. In spite of the utility of machines, overall assessments are still best made by humans. Machine learning tools can produce risk scores based on probabilistic models, so the best strategy may be for human analysts to start at the top of the list that machines produce and work down.

Such systems, however, are intended to be responsive to human feedback. Over time, machine learning--taking feedback from the analysts who use the tools' outputs and then prioritizing alerts accordingly--can improve the utility of tools to identify patterns of behavior that are most relevant. As this feedback improves the efficacy of the models, confidence in their outputs will increase. CERT is continuously working and experimenting with the outputs of tools to support insider threat programs. We work with the Defense Advance Research Projects Agency (DARPA) on test environments using real-world data sets, provide experimental environments, and publish results.

Risks and payoffs

Insider threat programs inherently collect sensitive information, so they can raise concerns about employee privacy. The machines described above aggregate sensitive information from HR, such as payroll and performance, with data about, for example, email usage. Aggregating that information into a system makes the aggregated data an attractive target to both internal and external hackers. If the organization's systems as a whole have exploitable vulnerabilities, hackers can do great damage to individuals. The generation of false positives by insider-threat monitoring tools can also lead to damaging unintended consequences.

The cost of establishing an insider-threat program depends on the extent to which the organization is able to reuse or repurpose existing assets.. Organizations with a network- or security-operations center are already collecting a lot of the technical data that can be used in an insider threat program. The data that HR or physical security are already collecting can also be used. The added expense comes from the additional effort to define an architecture that can be used to pull together these disparate data sources to enable analysis, as well as the expense incurred in selecting and deploying tools to support aggregate data analysis.

Looking Ahead

Achieving success in an insider threat program takes time, commitment, and buy-in from senior leaders, who can ensure the needed communication about the program. The government mandates such programs today, and compliance requirements also exist in the banking, finance, and health-care industries. In other sectors, though, a serious breach is often the key motivating factor that drives organizational change, and organizations that are more proactive often learn from the bad experiences of competitors that did not have programs in place and suffered consequences.

CERT continues to work directly with organizations to transition recommendations into their programs. At CERT, we have seen ancillary benefits to the growing awareness of insider threat in the more effective use of training and awareness and employee-assistance programs that emphasize prevention over compliance and punishment. The purpose of CERT support for insider threat programs is to protect assets that are essential to organizations' ability to continue carrying out missions and ensure the confidentiality, integrity, and availability of critical assets and systems.

Additional Resources

Read our previous post by Eliezer Kanal on machine learning in cybersecurity.

Learn more about the CERT Insider Threat Center.

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed