search menu icon-carat-right cmu-wordmark

Engineering for Cyber Situational Awareness: Endpoint Visibility

Phil Groce
• SEI Blog
Phil Groce

This post was co-written by Timur Snoke.

In this post, we aim to help network security analysts understand the components of a cybersecurity architecture, starting with how we can use endpoint information to enhance our cyber situational awareness. Endpoints collect a wealth of information valuable for situational awareness, but too often this information goes underutilized.

Detection of compromises cannot occur without visibility into the activities going on with assets. Network security analysts can view these activities in one of two places, or sometime both--directly on the device and in the communications going to and from the device (i.e., on the network). The first step in threat detection is knowing which activities can be seen on a device and understanding how to instrument the device to provide that visibility.

This is the fourth post in our series of blog posts on cyber situational awareness (SA) for the enterprise.

The Value of Endpoints

An endpoint is not just a disconnected black box that sits idly in the corner. An endpoint is typically a computing platform that has the authority to engage in predefined activities, such as processing capabilities, accessing resources, or communicating with other endpoints. All of these endpoints exist to provide value to the organization, but ensuring this value requires verification that they are behaving as the organization expects. Effective situational awareness enables policy enforcement by locating endpoints that act outside their authority, and provides operators with a holistic picture of both suspicious and benign endpoint behavior. This awareness supports decision making and helps the organization mitigate risk.

Operating systems monitor endpoints natively by generating logs and storing them locally or sending them to a central logging repository. Many organizations supplement this logging with additional monitoring, installing clients on an endpoint to inspect data while it is at rest, active in memory, or being transmitted. These clients can test data for malicious code or assure that the data is in a known state. They can also observe processes running in active memory and verify that they are running at an appropriate privilege level and engaged in expected behavior. Some clients go further and limit behaviors that are considered inappropriate or threatening, such as access to prohibited content or the insecure use of system calls.

The visibility gained by these additional monitoring capabilities comes at a cost. Monitoring processes take up memory and processor cycles. Some endpoint situational awareness tools act as a "shim" between the application and the operating system, though these shims may introduce latency and bandwidth limitations into normal operations. In some configurations, many of these solutions generate considerable amounts of log data, often sending it over the network to a central collector. The amount of storage required to archive these logs can add up quickly, especially when collecting from thousands or tens of thousands of endpoints, and network bandwidth taken up by this activity can add significant network overhead and saturate low-bandwidth connections, such as the wide-area network (WAN) links that remote clients may have to traverse to get to a central repository.

Some aspects of endpoint visibility can be implemented at the network level, allowing for situational awareness collections to either ignore duplicative data provided by the endpoint completely or to corroborate observations in one data set with information from another.

Common Objections to Endpoint Situational Awareness

Three things tend to discourage enterprises from making full use of their endpoints as tools for cyber situational awareness. First, and most obviously, most enterprises have a lot of endpoints: traditional workstations and servers; mobile devices such as phones, tablets, and laptops; and networked non-traditional devices such as medical and scientific equipment, point-of-sale systems, barcode scanners, and even light bulbs. The complexity of configuring them to make situational awareness-relevant information available for analysis can seem impossible. Fortunately, most organizations now have both robust procedures for defining baseline policies for how all machines in an enterprise should be configured and mature tools for enforcing those policies. Consider, too, that even imperfect endpoint visibility offers significant value to cybersecurity defenders. As daunting as it looks, the technical problem of implementing endpoint visibility is probably a lot easier than you think.

Another common barrier to effective endpoint monitoring is philosophical; many defenders are uncomfortable with endpoint data because an attacker who controls the endpoint may be able to tamper with it. This concern is a legitimate risk, but there are ways to manage it.

One of the most effective ways to manage the potential for tampering with an endpoint is to configure application logging to export log messages to a central server. For enterprise servers, it may be appropriate to export all log messages as soon as they are created, but that is usually impractical for other endpoints such as workstations, which are far more numerous and may not always have a good connection to the enterprise. To manage network usage in that case, consider exporting certain critical messages as soon as they are created and exporting lower priority messages at regular intervals (e.g., every four hours or when the endpoint connects back to the enterprise network). Finally, remember that even compromised logs have situational awareness value, as they can reveal what an attacker wishes to hide, or they may demonstrate a particular threat actor's tactics, techniques, and procedures.

In many enterprises, the biggest obstacle to endpoint monitoring isn't technical, but organizational. Endpoint management is frequently administered independently of network management, and may be further subdivided into administration of workstations, servers, cloud versus on-premise, etc. As we've said throughout this series of blog posts, truly comprehensive cyber situational awareness requires commitment and collaboration from all parts of the organization. Building these bridges can be an investment, but the return isn't just endpoint visibility, but a way forward to a more holistic approach to cybersecurity monitoring and response.

A Phased Approach: Begin Where You Are

As we said, most organizations have a dauntingly large number of endpoints to manage and monitor, but you don't need to monitor them all. You may want to begin by monitoring your critical on-premise server infrastructure, then take what you learn from that to your cloud-based servers, all while working with your workstation support to develop strategies for instrumenting on-premise, cloud, and mobile end-user environments. Incremental gains will deliver incremental value and important lessons.

Feel free to take it slow; just make sure that you take it steady.

What Data to Analyze

As you begin planning to use endpoint data for cyber situational awareness, you will need to answer a few fundamental questions, such as

  • What data do I care about?
  • How will I make it available to analyze?
  • How can I change my mind later?

Endpoints have a lot of data. When we discuss cyber situational awareness analysis in later installments of this series, we will talk about how to identify important data. For the moment, suffice it to say that the following types of data will probably be your most important (in roughly this order):

  • data you can't get anywhere else
  • data that relates that data to other data
  • data that corroborates what you already have

We are obviously most interested in the endpoint vantage to see things we can't see anywhere else. It therefore makes sense to first consider data (and metadata) on events that are available only on endpoints. Examples include information on process creation or on network-connection attempts that are blocked by an endpoint firewall.

While this data is useful, it becomes really useful when you can fuse it with data from other sources. For this reason, the second most important kind of endpoint data is information that can be used to relate endpoint data to other datasets. For instance, network-connection inspection can tell you that a network connection contained malware, and endpoint data can tell you what processes were open on a host. With a list of processes and the network connections they opened (including timestamps and destination information for the connection), it becomes possible to identify which process downloaded the malware.

Finally, while it may seem like a waste of effort to collect endpoint data that duplicates network data, it can sometimes be valuable to have an inference supported by information at two different vantage points. This inference helps analysts build confidence in their conclusions and rule out (or in) the possibility that any one sensor may have generated a false positive.

Where to Analyze Endpoint Data

A related question is where to store that data. Your options are

  • on the endpoint
  • in a central location
  • something in between

Usually, moving data from the endpoint to a central collector lets you more easily fuse that data with information from other observation domains and datasets more easily. As we've discussed, however, endpoints are a noisy data source, so sending everything back to a central location may require substantial engineering effort.

One approach is to analyze on the endpoint itself. There are two major drawbacks to this method. The first is that data is now analyzed on a machine that we are, by definition, evaluating for the possibility of compromise, so the confidence in our results must be adjusted accordingly. A second consideration, no less important, is that endpoints don't exist to do security analysis; they exist to do the business of the enterprise. Analysis shouldn't be allowed to affect availability negatively.

There is also the question of data retention. Again, storing lots of historical data on an endpoint can affect service availability for an end user; if you want to keep things around for a while, you will probably have to move it off.

A possible compromise between analysis and storage on the endpoint and moving it all to a central location is to stand up intermediate collection points that are located close (from a network perspective) to the endpoints whose data they collect. These intermediate nodes could perform some types of analysis, and can store data longer, since collecting and analyzing data is their primary purpose. An especially effective approach to endpoint data architecture is to use these collection points to run curation analytics geared to determine what data can be discarded and what is worth keeping.

When you decide to collect data centrally, most of the engineering considerations are identical to those for network data. We'll discuss these issues in more depth in future posts when we discuss engineering for network cyber situational awareness. For now, we'll mention only a few aspects of this data-orchestration problem that relate specifically to endpoint data:

  • The strategy you employ for sending data to your central location will depend on an endpoint's network connectivity, and connectivity to endpoints is highly variable. How much bandwidth will they have? When will they have it? Will additional steps need to be taken to protect the data in transit? Since all these considerations will influence the optimal strategy, and since they all vary so much among endpoints, you will likely need to cope with different guarantees about the recency or dimensionality of endpoint data.
  • Network-inspection data collection typically optimizes for a small number of fairly high-volume data streams. Endpoint data collection optimizes for a large number of relatively low-volume data streams. The architecture that works best for one may work poorly for another. Another way intermediary collectors can help, however, is to simplify this problem since aggregated endpoint data from a collector looks a lot more like a data stream from a network monitor and is more amenable to the same engineering approaches.
  • We've mentioned before, but it bears repeating: Endpoints do the work of your enterprise; backhauling endpoint data shouldn't unduly interfere with availability. With endpoint monitoring, it pays to remember that security exists only to make your mission more likely to succeed.

In the next installment in this series, we will turn to engineering for network cyber situational awareness. We will discuss network visibility, why it is required in addition to endpoint visibility, and when it may be valuable to implement some aspects of endpoint visibility at the network level.

Additional Resources

FloCon provides a forum for exploring large-scale, next-generation data analytics in support of security operations.

Read the first blog post in this series on situational awareness, Situational Awareness for Cybersecurity: An Introduction.

Read the second blog post in this series, Situational Awareness for Cybersecurity: Assets and Risk.

Read the third blog post in this series, Situational Awareness for Cyber Security: Three Key Principles of Effective Policies and Controls

Read about the SEI's work in network situational awareness.

Read other SEI blog posts about network situational awareness.

About the Author