Considerations for Deploying a Text Analytics Capability for Insider Threat Mitigation: Part 1 of 3

In this blog series I cover topics related to deploying a text analytics capability for insider threat mitigation. A text analytics capability is a means of measuring the risk employees may pose to an organization by monitoring the sentiment and emotion expressed in their communications.

A text analytics capability enables organizations to make data-driven decisions to manage risk. However, such an effort also spurs privacy concerns by aggregating and synthesizing sensitive information. In this blog series, I explore the strengths and describe the potential pitfalls related to deploying a text analytics capability.

In this first post in the series, I identify four critical considerations that should be contemplated before an organization moves forward to deploy a text analytics capability. In the next post, I will describe the foundational research underpinning text analysis for insider threat use cases.

It is our affect which colors both our internal and outwardly expressions, framing how we think, feel, and behave.

It is easy to collect, enrich, and derive insight from conventional technical data, such as logs monitoring user workstations and network devices. This data tells us what is happening on the organization's information systems. It doesn't tell us why or give us context into the user's mindset at that moment in time.

What we write can reveal our affect--the psychological constructs related to sentiment, mood, emotion, and personality. Collecting and analyzing behavioral data, such as measuring affect constructs from textual email content, has the potential to provide insight into a user's mindset. This information can be used to supplement existing behavioral data sources, such as those from human resource management systems or physical security records. Together, behavioral and cyber information paint a fuller picture of a given event.

Sentiment analysis can measure a message's polarity tone--positive or negative. Given the large troves of electronic communications and the potential to extract and interpret meaning, insider threat researchers, product engineers, and organization decision makers have looked to text analytics for insight into measuring and mitigating insider risk. Text analytics implements natural language processing (NLP) techniques to interpret and extract insight from textual data (e.g., electronic communications).

Before deploying a text analytics capability, let's review four specific ethical and legal due care considerations. These ethical and legal considerations are not simply about lowering an organization's overall legal liability (although that obviously is a catalyst for some of them). These frank discussions underpin the foundation of dependable science, smart business (acquisition) decisions, and organization trust. By honestly deliberating on potential risks and potential resultant harms, organizations can avoid common pitfalls, related wasteful security expenditures, and ineffective and privacy-violating capabilities.

Until a behavior or action has been actualized and observed, we should be wary about interpreting or predicting what will take place. There may be signs that the person may be more likely to cause harm, however, there is not a definitive way of knowing if they will continue to progress down the pathway. This is why it is critical to consult your legal counsel and human resources department and include them in any discussion about interpreting and using text analytics techniques used to identify disgruntled individuals or "high-risk" employees based on communications and textual data.

Text analytics offers the ability to observe evidence or indicators of personal predispositions, stressors, or concerning behaviors as they manifest, allowing organizations a time frame to respond to and mitigate potential issues. Imagine that these tools have a basic understanding of the sentiment, emotion, and personality of users. Using these tools helps illuminate the "whole person" perspective of an organization's employees, enabling tailored and effective mitigation responses. We will explore the measurement and interpretation of findings from text analytics solutions in the next blog segment.

Occasionally we get questions about what "personality" predisposition malicious insiders have. Sometimes the so-called Dark Triad--narcissism, Machiavellianism, and psychopathy--is mentioned as a model to explain malicious insider behavior. Then we are also asked about identifying personality traits in pre-employment assessments or with text analytics to identify warning signs of these "dark" personalities.

This is a messy ethical and legal question that is hard to answer. The correct and simple is answer is "no, there is not a 'Insider Personality'," just as there is no one "criminal personality." We want to identify and help the typically good people, who sometimes happen to display bad or undesirable behaviors, get back on track to being productive members of an organization. Sometimes corrective controls or intervention plans are a necessary component of that mitigation process. This is where indicators and countermeasures are deployed to detect and respond to concerning behaviors.

We can incorporate positive incentives into the prevention strategy to strengthen the rapport between the organization and workforce. This prevention strategy seeks to correct bad behavior by building trust and mutual respect with the workforce to lower their desire to act out and do harm.

It's important to be aware of automation bias and realize that these intelligent systems are far from perfect. In fact, many of the most advanced text analytics solutions fail to properly handle everyday colloquialisms and language inflection, such as negations, jokes, or sarcasm. For instance, a basic bag of words approach typically does not support handling negations (e.g., "not" in "I am not angry") in language. Relatedly, be mindful of tools leveraging "bad word lists" as a means to flag words associated with some undesirable activity (e.g., "logic bomb"), as this method can be plagued with high false positives. This technique will be discussed in detail in the next post.

Also, be mindful that text analytics, or any other artificial intelligence (AI) or psychometric capability, should not be deployed in a way that would profile individuals or could discriminate against a legally or otherwise protected class of individual. There is a lot of new work in the area of discrimination and AI where this issue is further evaluated and discussed. For more information, see this NPR story on human-like bias of algorithms and the IBE article about business ethics and IA.

You may want a requirement built into your acquisition process that data-driven and intelligent technologies should have some sort of process for justifying or substantiating evidence for decisions. In the case of text analytics, this could be a matter of providing users with a view into the message flagged as concerning, and an explanation of how that set of words relates to whatever affect construct is being measured.

It should be clear by now that a text analytics capability can do a lot of harm if misused. Identifying what parties are responsible for maintaining the text analytics capability establishes accountability and helps mitigate potential risk scenarios.

As organizations research how to select the right text analytics solution for their needs, it's important to also consider who will be operating the tool. Who will have access to use and manage this capability? Will this be a third-party service with subject matter experts (SMEs) interpreting the output? How will a commercial tool service without on-demand SME service be managed? And in the event that an organization has the resources to build a custom capability, what type of knowledge, skills, and abilities (KSAs) should be required of the development team versus the analyst team? In any of these cases, who is responsible for defining the software requirements and evaluating the performance?

For some organizations that are risk-averse when it comes to privacy, there may need to be an intermediary agent (e.g., human SME within an organizations insider threat program) that reviews potential "red flags" from the text analytics solution and verifies the accuracy of the report before it is moved forward into an inquiry. Organizations should consider what type of KSAs, and educational background individuals in this role should have. Additionally, organizations should reflect on what type of training should be provided to support the insider threat analyst's performance.

Finally, here is a non-exhaustive set of related questions to consider prior to signing off on a text analytics capability:

What is the classification of the data residing in the text analytics capability?
Who needs to have read and write access to the data?
How is data from the toolinterpreted? By another human or by a risk algorithm?
What training is provided to users?
What is the next step if an analyst identifies something of concern from alerts generated by the tool?
What is the process for using output from a text analytics tool as part of an insider threat inquiry or investigation?
How should organizations inform employees that they may be monitored?
What rights do employees have regarding their electronic data? Do they have the right to request records? How would such a request be filled?
How does a text analytics tool handle privileged communications, such as those between an individual and their attorney or doctor?

A text analytics capability is not a silver bullet solution for gauging employee mood, sentiment, or propensity for bad behavior. In fact, these tools can give organizations a false sense of confidence if the workforce appears to be "positive" or "contented." This capability is only one component of the overall intelligence stack that organizations can deploy for risk monitoring.

Stay tuned for the next blog segment, Foundational Research Behind Text Analysis, to learn more on the science behind text analysis for insider threat.

Please send questions, comments, or feedback to insider-threat-feedback@cert.org.

Software Engineering Institute

SEI Blog