Network Profiling Using Flow

Knowing what assets are on a network, particularly which assets are visible to outsiders, is an important step in achieving network situational awareness. This awareness is particularly important for large, enterprise-class networks, such as those of telephone, mobile, and internet providers. These providers find it hard to track hosts, servers, data sets, and other vulnerable assets in the network.

Exposed vulnerable assets make a network a target of opportunity, or "low-hanging fruit" for attackers. According to the 2012 Data Breach Investigations Report, of the 855 incidents of corporate data theft reported in 2012, 174 million records were compromised. Of that figure, 79 percent of victims were targets of opportunity because they had an easily exploitable weakness, according to the report. This blog post highlights recent research in how a network administrator can use network flow data to create a profile of externally-facing assets on mid- to large-sized networks.

Network flow data, which is an aggregation of the header information contained in datagrams (packets), can be used to create profiles of network traffic, detect malicious activity, and determine appropriate traffic prioritization settings. Network flow data includes information about communicating pairs of IP addresses, and the ports and protocols on which they communicate, as well as aggregated byte counts and flags used.

Network administrators can use network profiling to consider how decisions about configuration changes will affect the rest of the assets on their network. Security administrators can evaluate the profiles to identify assets that violate policy and suspicious activity, while business administrators can use the profiles to help guide long-term decisions regarding network security.

The intent of this research by the CERT Network Situational Awareness Team was to create a step-by-step guide for using network flow to inventory or profile a network that includes thorough explanations of why certain steps were chosen so that administrators could understand the process and tailor the steps for their environments. We on the research team focused our analysis on creating a profile of externally facing assets on mid- to large-sized networks that serve thousands to hundreds of thousands of users. We used data from a medium-sized enterprise network that allowed us to access its typical data usage. By focusing on network flow data, we had much less data to deal with than if we collected all traffic on a network (full packet capture). Focusing on headers also allowed us to avoid issues with confidentiality and privacy because we were not actually collecting payload information.

We then parsed the data we collected using the System for Internet-Level Knowledge (SiLK). SiLK is an open-source tool developed by the CERT Network Situational Awareness Team that is an efficient network flow collection and storage infrastructure that will accept flow data from a variety of sensors.

To produce relevant results, the process we developed for network profiling must complete within a fixed amount of time. For networks with relatively stable assets, this process could take place over one or two months. For fast-changing networks, the process could take place in as little as one to two weeks. By following these steps, a network administrator profiling a network will obtain a list of public-facing assets, information about the ports through which each asset is communicating, and other pertinent information, such as the external IP addresses to which the asset is connecting.

Our approach can be broken down into the following steps:

Gather available network information. It is important to gather available information about a network prior to beginning the profile. This information helps to define the scope for the rest of the process. Baseline information can include address space, network maps, lists of servers and proxies, and policies governing network design. Even if this information is incomplete or out of date, it still provides a reference baseline.

Not everything about the network can be known. Consider conducting a quick assessment or penetration test to develop a network map and a list of exposed services on various machines. Doing so provides a background for the profile. Later, the network maps and lists of servers can be updated for a cycling-through process.
Select an initial data set. This step shapes the entire analysis. A representative sample must be large enough to represent typical traffic, but small enough to support an iterative processing of queries. The general guidelines for selecting a sample data set include

â€¢ Duration. Start with at least an hour's worth of data. Add more data up to a day's worth, until the query time starts to slow.
â€¢ Timing. Select the busiest time of day to carve out the most representative network traffic.
â€¢ Direction. If the traffic is bidirectional, start by looking at outbound traffic.
â€¢ Sampling. Avoid starting with sample data because it may mask important routine behaviors.
â€¢ Network size. Consider separating a large IP-bound network into a few independent profiles and merging them after analysis is complete.
Identify the active address space. Issues involved in monitoring the address space include whether sensors cover private address space, what traffic is expected on failover circuits during normal operations, and whether a business unit has connected a system without administrator knowledge. The steps involved in identifying and monitoring the active address space include

1. Identify hosts that have active Transmission Control Protocol (TCP) connections and those that don't.
2. Identify hosts that generate a non-trivial amount of traffic on protocols other than TCP.
3. Aggregate individual hosts into populated network blocks.
4. Examine additional information gathered in step one to confirm the list of active IP address blocks.
Catalog common services. After the active hosts on the network have been identified, inventory the services that comprise the majority of bandwidth use and business operations, such as web traffic and email. Once these protocols are inventoried, start working on other services likely to run on the network and visible to instrumentation, including Virtual Private Network (VPN), Domain Name System (DNS), and File Transfer Protocol (FTP).
Catalog remaining active assets. The list of assets in the profile thus far should cover almost all network traffic, as well as services that were profiled based on the most frequent services in the network. Prior to profiling any remaining hosts, expand the time frame for the sample data set to see if there are other active hosts that were not represented in the smaller data set. The expanded data set should include at least one month's worth of data for the most accurate results. While profiling leftovers, note the findings in the profile, even if they are hard to verify or if they seem incorrect. This information can always be reviewed in more detail during further investigation.
Maintain the profile. Six months after you create a profile, it may no longer be accurate. A majority of these steps can be automated to address this problem. For example, network flow analysis software may allow for scheduling filters to run weekly or monthly during this process. Automated tools can only do so much, so you need to consistently validate potential assets for a particular service before adding them to the profile. To update the profile at least once a month, either run through the profiling process again or examine trends over time.
Report on findings. To increase the impact of the profile, consider adding more data points that pertain to security, including

â€¢ machine administrator
â€¢ update schedule
â€¢ intended purpose

Challenges in this Approach

Relying only on network flow data to create a network profile is inherently more inaccurate than using full packet capture. As long as administrators are aware of the limitations of using flow data (as explained in the report documenting the guide), useful results can be produced by following the steps in the guide.

Future Research

Our team is currently examining how administrators customize and use the results of network profiles. We are investigating how to automate these steps and implement a continual update process. We are also considering development of a new step-by-step guide using this same approach with a different tool. For example, ARGUS is another network flow analysis tool with a slightly different approach than SiLK. Please feel free to suggest other tools for investigation in the comments section below.

Additional Resources

The technical report describing this research, Network Profiling Using Flow, may be downloaded at
https://resources.sei.cmu.edu/library/asset-view.cfm?assetID=28115

Our approach is based on System for Internet-Level Knowledge (SiLK), an open-source tool developed by the CERT Network Situational Awareness Team. You can download SiLK at
https://tools.netsa.cert.org/silk/.

The Network Situational Awareness (NetSA) group in the CERT Program at the SEI developed and maintains a suite of open-source tools for monitoring large-scale networks using flow data. These tools have grown out of our work on the AirCERT project, the SiLK project, and the effort to integrate this work into a unified, standards-compliant flow collection and analysis platform. You can view that suite of tools at
https://tools.netsa.cert.org/index.html

Software Engineering Institute

SEI Blog