An Analyst-Focused Approach to Network Traffic Analysis
Earlier this year, a team of researchers from the SEI CERT Division's Network Situational Awareness Team (CERT NetSA) released an update (3.17.0) to the System for Internet-Level Knowledge (SiLK) traffic analysis suite, which supports the efficient collection, storage, and analysis of network flow data, enabling network security analysts to query large historical traffic data sets rapidly and scalably. As this post describes, our team also recently updated the Network Traffic Analysis with SiLK handbook to make it more analyst-focused and teach not only the toolset but also the tradecraft around using it.
The previous version of the guide, which was published in 2014, is organized by the individual tools in the SiLK tool suite. The new version of the guide is written from the perspective of the network traffic analyst. As such, the handbook is organized according to the workflow that we recommend analysts follow to investigate network activity and anomalies. In addition to myself, the new version is authored by Paul Krystosek, Nancy Ott, and Timothy Shimeall.
The analytical thought processes outlined in the new version of our handbook apply to any type of general security analysis. This handbook offers insight on how to think through the problems, address them, and apply the methodology to analysis of network flow or other data.
An Analyst-Focused Perspective
In recent years, CERT researchers have reached out to the SiLK user community, including representatives from the departments of Defense, Homeland Security, and other government agencies. These users indicated they needed a guide that presented the tools from an analyst's point of view, which became the focus of our efforts.
For example, the updated version of the SiLK handbook offers the following information on analyzing network flow data, including basic, intermediate, and advanced analyses with accompanying case studies for each level of analysis:
- Single-path analysis. In layman's terms, single-path analysis can be described as the start-to-finish approach of combining one or more analytical steps to characterize network behavior. As shown in the figure below, this basic, straightforward form of data analysis requires no conditional steps, integration, or even much refinement.
- Multi-path analysis. Some network behaviors cannot be seen within the single view of the network flow data provided by single-path analysis. Finding them requires investigating and integrating several different views of the data. This intermediate-level analysis is known as multi-path analysis.
While a single-path analysis may involve looking at summary data or just one part of a data set, multi-path analysis explores different aspects of the data set (ports, IP addresses, protocols, packet and byte volumes, flow types and volumes, etc.) to find trends, leftovers, and groupings that are not necessarily visible in a single view of the data. As the figure below illustrates, multi-path analysis builds upon single-path analysis; often, a single-path analysis is performed as just one phase of a multi-path analysis.
- Advanced Exploratory Analysis. The exploratory approach is the most open-ended of the analysis workflows. As the name suggests, exploratory analysis involves asking questions about trends, processes, and dynamic behavior that may have no fixed or obvious answers. Often the analysis leads to more questions.
Exploratory analysis uses single-path and multi-path analyses as building blocks to provide insight into network events. It typically considers more than one indicator of network behavior to provide a more complete understanding of what happened. Each building block represents a question (or part of a question) whose answer feeds into subsequent steps in the analysis. Analysts can assemble these building blocks to prototype an analysis or examine one-time phenomena, iterating if necessary.
Users who are interested in analyzing network flow records with tools other than SiLK are encouraged to read the overall description of the analysis approaches in the handbook and then use the description of commands to find parallels using the tool suite of their choice. Each level of analysis in the handbook includes one or more case studies that were developed from the publicly-available FCCX-15 data set. The case studies guide analysts on how to use the SiLK tools on this data.
For instance, the case study, "Building Inventories of Network Flow Sensors With IPsets," is included as an example of multi-path analysis:
Flow sensors commonly monitor strategic points in enterprise networks where different network environments meet. This environmental complexity affects sensor flow collection and analyst knowledge as network infrastructure evolves. For example, multiple sensors may overlap their flow collection for failover purposes; as the network routes traffic, analysts may need to determine which sensor is the primary flow collector.
To mitigate these issues, analysts can create and maintain inventories of network sensors, making it easier to review and validate them. These sensor inventories consist of SiLK IPsets that contain internal network addresses monitored by a flow sensor. They are generated by applying the following multi-path analysis workflow.
1. Path 1 associates network addresses with a single sensor.
2. Path 2 associates network addresses of the remaining sensors.
3. Path 3 associates network shared addresses.
4. Finally, the results of each part of the multi-path analysis are merged to create a complete inventory of sensors.
SiLK is a Unix-based tool set, so the handbook includes an appendix that describes command-line utilities to parse information. The appendices also introduce fundamental networking concepts, a summary of SiLK commands referenced in the guide, and a list of sources for additional information about the SiLK tool suite and network analysis.
A Focus on Big Data
Network traffic analysts must increasingly use big data tools to gain a complete picture of network situational awareness. As my colleague, Tim Shimeall, pointed out in his blog post highlighting two approaches for going beyond network flow, Cisco expects in the next two years annual global IP traffic will pass the zettabyte ([ZB]; 1000 exabytes [EB]) threshold and reach 2.3 ZBs, with smartphone traffic outpacing computer traffic. In the post, Shimeall notes that "operators of networks with even comparatively modest size struggle with building a full, comprehensive view of network activity." As a result, network traffic analysts are taking data they acquire and importing it into big data platforms, which provides a ground truth representation of what occurred on the network.
SiLK is optimized to handle large amounts of data, and the processes that we are recommending would work well for looking at data in any amount. SiLK commands can be integrated with Python programs to expand the tool suite's capabilities. The handbook also includes recommendations for analysts who are contending with large amounts of data, e.g., limiting the query size, using pipes to redirect output to other SiLK commands instead of storing it in files, and not writing files to a network disk.
Wrapping Up and Looking Ahead
As we continue to improve the Network Traffic Analysis with SiLK handbook, we will also continue to actively engage the user community at FloCon and other venues and provide regular updates to the guide. We hope that our focus on using the SiLK tools within the context of an analysis framework will help analysts better understand the behavior of their networks and be more effective at finding anomalies.
On a separate but related front, we are also considering development of a "cookbook" of analytics involving the SiLK tool suite. We welcome your feedback in the comments section below.
The updated text and the decision to present the information from an analyst perspective were the result of user feedback at recent FloCon conferences and other venues. We will also be looking for feedback on the handbook at the 2019 FloCon conference. Users can send suggestions for updates anytime to email@example.com.
The latest Open Source version of SiLK and selected previous releases are available from http://tools.netsa.cert.org/silk/download.html.
Other tools developed by CERT's NetSA group include the following:
- Yet Another Flow Sensor (YAF) processes packet data into bidirectional flow records that can be used as input to an IPFIX Collecting Process. YAF's output can be used with super_mediator, Pipeline 5, and the SiLK tools.
- Analysis Pipeline 5.8 is a streaming analysis tool than can process more than just SiLK flows as done in version 4.x. It can now process YAF records and raw IPFIX records. It can do all of the analyses available in version 4.x. A notable enhancement is expansive DNS record processing. This includes fast flux detection and domain name watchlisting.
- super_mediator is an IPFIX mediator for use with the YAF and SiLK tools. It collects and filters YAF output data to various IPFIX collecting processes and/or csv files. super_mediator can be configured to perform de-duplication of DNS resource records as exported by YAF.
SiLK tools are also available on the CERT LiFTeR website, where the tools are available for Fedora 23 through 28, Redhat Enterprise Linux, and CentOS releases 6 and 7.
Read Tim Shimeall's post, Traffic Analysis for Network Security: Two Approaches for Going Beyond Network Flow Data.
Read the SEI Blog Post Best Practices in Network Traffic Analysis: Three Perspectives by Angela Hornemann, Tim Shimeall, and Timur Snoke.