search menu icon-carat-right cmu-wordmark

Performing Text Analytics for Insider Threat Programs: Part 3 of 3

Headshot of Carrie Gardner

This blog series reviews topics in performing text analytics to support insider threat mitigation. This post presents a procedural framework for operationalizing this capability. It walks through the process of considering text analytics capability through putting it into practice. The blog also enumerates thought questions about whether to acquire a commercial textual analysis solution, repurpose an existing tool, or develop an in-house capability.

How a Text Analytics Tool Fits into Insider Threat Risk Measurement

As discussed in the last post, text analytics supplements traditional insider threat monitoring such as user activity monitoring logs or network traffic by measuring users' affect. As previously discussed, affect analysis and tasks such as sentiment and emotion detection help to monitor and measure the situational context of events that has already collected with traditional logging. Insider threat analysts can use this information to build improved threat models with deeper and more insightful features. For example, text analytics can identify levels of heightened user negativity and anger. Insider threat models can incorporate this information. Analysts who interpret the outputs from such models can put activity from a host-based audit system into context to assess the severity of the concerning behavior.

Operationalizing a Text Analytic Capability

Figure 1 presents a framework for incorporating a text analytics capability into an insider risk monitoring process, starting with gaining legal approval. We walk through each step in the next few sections.

Figure 1: Framework for Incorporating a Text Analytics Capability into an Insider Threat Program.

Figure 1 - Framework for Incorporating a Text Analytics Capability into an Insider Threat Program

1. Obtain Initial Buy-In

Before data collection or analysis begins, organizations must seek counsel and obtain preliminary buy-in from all pertinent stakeholders--legal, privacy, human resources, and civil liberties officers. This initial conversation guides expectations for overseeing the adoption of a text analytics capability and establishes future communication points of contact. Multiple follow-ups will be needed as the process formalizes and a well-defined purpose and scope emerge.

Obtaining initial buy-in from an oversight committee helps to avoid liability concerns that could arise given the potentially sensitive data used. Typically, this also means that you must provide employees with a written notice of the organization's monitoring practices, such as displaying a banner at login. Remember that employee monitoring laws vary from state to state and country to country, so consult your counsel to find out what kinds of monitoring are permitted. We find that it is most critical for organizations to maintain a clear, consistent, and visible policy that communicates what employees can expect in terms of privacy when they are using employer-owned devices such as their workstations. Please see the article The insider threat and employee privacy: An overview of recent case law for more information on case law surrounding employee monitoring.

Consider the following thought questions:

  • What is off-limits in terms of data collection?
    • Privileged communications, such as with an attorney or a doctor, are usually off-limits
  • What is off-limits in terms of data analysis?
    • Are there restrictions on the type of approach and technique used in the tool?
    • Do you need to know how the tool came to a decision (a white box/transparent view)? Or are the output and a reference to the concerning input sufficient to allow insider threat analysts to audit the output from your text analytics tool?
  • What is off-limits in terms of output use?
    • What additional information is necessary to pursue an investigation into a target if the results from the text analytics tool suggest something is of concern?
  • What level of interpretability is needed?
    • Does the chosen capability provide a justification for how it 'scored' or produced an output?
    • Does the application need to provide analysts the ability to immediately pull up the message(s) of concern that made the tool produce that output?

2. Define Purpose, Scope, and Objectives

With initial buy-in from the applicable legal and privacy stakeholders, the next step is to identify the purpose and intent of monitoring via text analytics. The purpose should define the justification for using this capability (such as to mitigate risk of insider activity). The scope of the plan identifies who will be monitored, what textual data (e.g., employer-owned email) will be covered in the data collection effort, and how the findings will be used.

During this process, you can formalize the business requirements for your text analytics tool and identify desirable features and capabilities. Remember, this is a good time to check in with the oversight agents (e.g., legal, human resources, and privacy) to ensure that the proposed solution complies with the organization's policies, local law, and ethical standards.

Finally, create objectives to set outcomes and identify check-points in the timeline for executing the project.

3. Identify Potential Risk Indicators

After you have outlined the purpose, scope, and objectives for implementing text analytics, you can start to develop ideas for how to achieve them while staying in scope of what has been approved. Begin identifying which indicators can be assessed that fall within the stated purpose, scope, and objectives of your solution. Text analytics indicators comprise the socio-behavioral observables--the personal predispositions, stressors, and concerning behaviors that are modeled in the Critical Path model, loosely based on Calhoun and Weston's pathway to violence.

Examples of potential risk indicators include

  • change in the intensity of negative sentiments
  • sudden change to an intense negative emotion, such as anger, sadness, or disgust
  • manipulative, deceptive, or antisocial behaviors1

4. Select Analysis Technique

The next step in this process is to decide how the text analytics tool can identify each indicator. Which observable measurements (i.e., affect constructs such as sentiment or emotion, conversation topics, or keywords) indicate the target's condition? How can text analytics assess these observables?

Consider the range of appropriate solutions--for example, an instrument-based tool that employs LIWC to identify and measure angry emotion in text. You will need to plan out an evidence-based approach for reliably and soundly analyzing text communication. This may include using a tool specifically designed for text-analytics or an in-house solution whose custom configuration can support the unique requirements of your organization.

After you have identified the purpose and goals for your text analytics solution, the next step is to operationalize which indicators can be captured and assessed. Which indicators are relevant for the risk scenarios you wish to avoid? These indicators should be monitored. Examining insider threat models such as IT Sabotage, IP Theft, Fraud, or Work Place Violence can help you to identify common precursors that could indicate future insider threat activity. For text analytics, focus on the socio-behavioral features that can be identified through language.

5. Collect Data

With your analysis plan in place, the next step is to prototype it with a test sample. Identify which resources are currently in place to collect and aggregate employer-owned textual data, such as employer-owned email and chat logs. Staying with employer-owned and operated data reduces legal risks if you are concerned about employee privacy and other legal issues. Will data feeds be programmatically pulled (e.g., from a RESTful API) or automatically pushed from the server-side (streaming data)? Will data be provided in response to written requests?

During the data identification and collection phase, keep in mind that linguistic style and expression vary across communication media and cultures. As you can imagine, email is typically considered a more formal channel than instant messaging. Communication across several forms of media can provide a more holistic portrait of a user's affective states.

6. Analyze Textual Data

Before you begin analyzing the data, you need to identify the following:

  1. How will the outputs be used with existing insider risk monitoring processes?
  2. Do we trust our analytic approach to provide sound outputs for insider risk intelligence?

Practitioners should establish workflows for how to use the outputs from our text analysis tool in conjunction with existing monitoring processes. Should the outputs be used as an input to other risk models? Or should they solely be used as supplemental information that an analyst can review upon an incident escalation? Providing clear guidance on how to utilize these outputs will helps you avoid potential pitfalls related to discretionary decision making. It also educates analysts about the organization's expectations and procedures.

As far as trusting the tool outputs, please see the first post on operational considerations for using text analytics for a more detailed review of the legal, ethical, and privacy considerations for using this capability.

7. Report and Refine


The framework for incorporating a text analysis capability is cyclical. It requires continuous reporting and refinement during the program lifecycle. Develop evaluation criteria to measure the insight, reliability, and utility of the text analytics process. Insight refers to how well the metric can provide useful intelligence about the target activity. Reliability refers to whether the metric can provide consistent, sound, and trustworthy information. Utility refers to the overall usefulness of the metric in the grand scope of the operation. Does this tool and technique overlap with other tools? If so, what is the best way to use the tools to provide the broadest coverage with the least amount of overlap?

After validating the analysis process, you can deploy the capability to the general population or a higher-risk group (such as administrator users or those in trusted access programs). Consider plans to routinely audit and evaluate the key performance indicators of the text analytics capability to augment your insider risk monitoring process.

Adopting a Solution

As an alternative to acquiring a new text analytics tool, you can adapt an existing tool to this purpose. Start by surveying tools that are in current use within your organizations. Do you have a business intelligence (BI), full text search, data mining tool, or user behavior activity (UBA) tool that supports text analytics? To begin, you may want to run a small test to see whether an existing tool can fulfill some of the performance measures you outlined in your planning stage.

Repurpose an Existing Tool

If you have an existing technology with text analytics features, investigate how it can be extended to support the requirements outlined in the planning stage. This may be as basic as key word searching to identify if sensitive programs are mentioned or are used in an email sent to an external address. However, this capability can be harnessed to provide a baseline for assessing other commercial or open source tools that you may wish to use. Also, consider how existing insider threat tools can be employed for text analytics.

Develop an In-House Solution

If you would rather develop your own solution, you can build upon existing open source technologies to develop a capability for measuring user sentiment or emotion. These technologies include data processing libraries such as scikit-learn and MLlib, natural language processing (NLP) resources such as NLTK and Stanford's NLP software, and lexicons such as general inquirer.

Acquire a New Tool

If neither reusing an existing tool nor developing a custom, in-house solution meets your needs, begin enumerating the requirements for a commercial application. As part of this process, you can review current tools for BI, UBA, and full-text search. As with repurposing an existing tool, perhaps it is time to think about upgrading one of these tools to support text analytics. Please see Acquisition Overview: The Challenges for more information on the acquisition process for adopting a commercial, off-the-shelf solution.

The bottom line is this: when you consider any new capability to augment an existing insider risk monitoring plan, it is imperative to ask the follow questions:

  • How will the new capability align with existing processes?
  • What is the empirical foundation from which it derives its approach and techniques?
  • How will the capability be operationalized in your environment?
  • What are the potential legal, ethical, and privacy challenges that must be addressed?

With this blog series, we described insider threat use cases for text analytics, identified points of legal and ethical considerations, and described the empirical foundation that provides evidence of its applicability to this space. Please remember to address the appropriate authorities and obtain the appropriate approvals before moving forward with adopting text analytics (or any other new capability). If you have further comments, questions, or feedback, please contact us at

1 Remember that indicators vary in utility. We find that en masse, the presence of more indicators may indicate something concerning or anomolous but does not necessarily suggest malice.

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed