Flow Analytics for Cyber Situational Awareness
It's the holiday season, a traditionally busy time for many data centers as online shopping surges and many of the staff take vacations. When you see abnormal traffic patterns and overall volume starts to rise, what is the best way to determine the cause? People could be drawn to your business, and you will soon need to add surge capacity, or maybe you are in the beginnings of a denial-of-service attack and need to contact your service provider. This blog post highlights recent work by CERT researchers to help organizations gain cyber situational awareness, which is based on network flow, and provides a tool to gain invaluable insights into ways your network is being used. More importantly, it helps you decide how to respond to changes in the online environment.
Cyber Situational Awareness
In the past few decades, much research has been done on understanding situation awareness in today's fast-paced and ever-changing environment. Among all the work in situation awareness, we at CERT find that the work of Dr. Mica Endsley is mature and quite applicable to the cyber domain. Endsley proposes the following definition for situation awareness:
Situation awareness is the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future.
In 1995, Endsley published a model of situation awareness in dynamic decision making. A simplified version of the model is shown here:
This model presents a three-step process that is necessary to achieving situation awareness:
- perception, or sensing, of the environment
- comprehension, or understanding, what has been sensed about the environment
- projection, or prediction, about what the environment might be in the near future
The model helps place cyber situational awareness into its proper context. Cyber situational awareness is the subset of all situation awareness necessary to act in cyber. Cyber situational awareness is not an end in itself; rather, it is a means used to support the decision to take action, and successful situation awareness happens when we achieve our goals in the cyber domain.
Finally, the model also demonstrates the importance of sensing the cyber environment. Since the foundation of the cyber environment is the network used to connect computers, network sensing plays a pivotal role in sensing the environment. A fundamental way to sense the network is by analyzing network flow, which, in this context, is based on the Open Systems Interconnection reference model (ISO 7498) and records layer 3 (network) and layer 4 (transport) information about every connection passing by the flow sensor.
With this understanding of cyber situational awareness, we can begin to apply many concepts from the more general situation awareness body of knowledge (See Endsley's book Designing for Situation Awareness: An Approach to User-Centered Design, Second Edition).
Flow and Cyber Situational Awareness
The network is the vehicle for communications, and flow is the best way to sense the network. Armed with flow, we can make sense of the cyber domain. The set of nodes, links, users and applications make up the cyber domain. Other sensor methods generate different data sources and can be used to address specific needs. However, most of these data sets actually contain flow information.
Cisco Netflow is a specific implementation of network flow monitoring; IPFIX provides a more general implementation for storing and transmitting network flow. Netflow data can be used for forensic purposes, for finding malicious activity, and for determining appropriate prioritization settings.
When using flow as a significant data source to understand the cyber environment, we find one significant characteristic that makes cyber situational awareness different from traditional situation awareness models: we as humans have no innate ability to sense the cyber domain. Our sensors provide flow data, but what does a flow really look like? Without some foundation for sharing information about activities in cyber space, we face immense challenges in standardizing how to present cyber information to support human decision making. To overcome this challenge it might help to split our model of situation awareness and show the human-to-computer interface necessary to present cyber data to a decision maker.
The model above helps us arrive at the purpose and constraints of flow analytics in cyber situational awareness. Flow is our primary method for sensing the cyber environment and becomes a principal input for analytics. To support the human situational awareness process, the analytic results are presented to the human decision maker through a human-computer interface. Simple analytics present the decision maker with low-level cyber environment data, while complex analytics preset a much more robust comprehension. The interface should be interactive so the human can pull additional data from the system to resolve ambiguity in analytic results. Finally, the decision maker can use analytic results to take actions in the cyber domain.
Cyber Situational Awareness Analytics and Metrics
Flow can provide a number of useful metrics, and these metrics can help profile the network traffic to further understand the cyber environment. Security administrators can evaluate the profile for assets that violate policy and for any suspicious activity. Business administrators can use the profile to help guide long-term plans for network upgrades and staffing. As the profile changes over time, network operators and defenders can monitor for emerging concerns. This flow information, in turn, can lead to policy changes and reallocation of network resources.
The most effective metrics are simple and approachable with little background knowledge, since they provide a common language for discussing cyber behaviors. Flow provides an excellent measurement that can provide a small number of well-understood metrics.
Current state-of-the-practice depends largely on counts, sums and averages to produce metrics. These metrics are readily understandable and easily interpreted; however, they only produce a basic model of the cyber domain. Improved analytics yield a much more robust set of metrics and a deeper comprehension of cyber behaviors.
The family of time-based analytics yields metrics of duration. The most common metrics are active time, idle time, and counts of seen entities and unseen entities. Slightly more complex time-based analytics include persistence (last seen time minus first seen time) and total time seen (sum of all active time). These time-based analytics are often applied to individual entities or groups of entities, and are particularly applicable when using flow to maintain a dynamic network inventory.
Frequency-based analytics provide another perspective of the cyber domain. Arrivals, departures and churn all compare information about hosts observed over a sequence of time windows. Burstiness is used to characterize variance in traffic volumes also over time windows. Transilience (also called flapping in BGP routing analysis) provides a measure of how often an item was seen in one time window, and then not seen in a subsequent time window.
Performance-based analytics enrich our comprehension. We are currently working on a conformity metric that shows great promise in detecting deviant traffic patterns. This metric will identify unusual traffic patterns compared to most other traffic patterns. We believe a conformity metric has the potential to identify security events such as data exfiltration and subtle flash crowds.
CERT Tools for Cyber Situational Awareness
While the importance of cyber situational awareness is well established, only recently have organizations made concerted efforts to collect and analyze network flow data to support awareness. We typically depend on a reference implementation of the CERT NetSA Security Suite to support the situational awareness needs of our sponsors.
- YAF (Yet Another Flowmeter) captures network information and creates IPFIX-based flow records. This tool provides the sensing capabilities necessary to gain visibility into the cyber domain.
- SiLK, the System for Internet-Level Knowledge, provides the capability to collect, store and view the raw flow data generated by YAF or most other flowmeters.
- The netsa-python and pysilk python libraries give analysts an application programing interface (API) for creating analytics. These analytics consume flow data repositories and generate broader comprehension of sensor data.
- Analysis pipeline provides a streaming analytic toolkit to speed comprehension and quickly present results to the decision maker.
For more than a decade CERT has been working with federal and civilian government agencies to secure, operate, and defend our nation's most critical networks. Gaining cyber situational awareness over the networks is an ever more critical mission need, and network flow analysis is a foundational component to support that need.
While government, academic, and other operations staff are increasingly realizing the value of network flow data, many still do not comprehend the depth and breadth of issues surrounding network flow. We continue to focus our efforts on evolving the network flow model and network sensors to increase automation on the left-hand side of the cyber situational awareness model introduced above.
We will be discussing this and other cutting edge analytics at the 2016 Flocon conference.
We welcome your feedback on this research.
For more information about FloCon, please visit www.cert.org/flocon/.
To view the technical report, Network Profiling Using Flow, that I coauthored with Austin Whisnant, please click here.
In late November, I presented the SEI Webinar, Using Network Flow to Gain Cyber Situational Awareness. To view that webinar, please click here.