icon-carat-right menu search cmu-wordmark

Forensic and Measurement NetFlow—Bridging a Gap We Didn't Notice

Presentation
Michael Collins of the University of Southern California Information Sciences Institute presented this session at FloCon 2024.
Publisher

Software Engineering Institute

Topic or Tag

Abstract

In this talk, we will focus on techniques and solutions for Forensic NetFlow summarization USC-ISI is developing. Since its original development, NetFlow analysis has split into two broad use cases: traffic measurement and forensics. These two cases are divided by their tolerance of sampling—the traffic measurement community, focused on technologies such as sketch, is focused on real-time, on-sensor data summarization and will tolerate sampling in exchange for rapid data collection and insights. The forensic community, in contrast, is focused on finding significant but rare phenomena and cannot lose those rare phenomena through sampling.

The forensic community consequently faces multiple operational challenges across different axes but all driven by the need to losslessly express traffic information to overloaded analysts. In this talk, we will discuss multiple strategies USC-ISI is exploring for this purpose. These strategies fit into three broad categories:

  • New flow representations. NetFlow was originally developed for an internet dominated by single-client single-server interactions. We have been experimenting with a class of flow representations we call superflows, which describe multiple-server single-client interactions.
  • Optimized sensor placement. Complete data collection necessitates redundant data collection; we are experimenting with techniques to optimally place sensors and configure them to selectively collect flows to reduce the data footprint and redundancy.
  • Additional flow data to support cross-correlation. The strength of the flow representation is that in absence of other information, it provides the biggest 'bang for the buck' for data storage. Adding the correct (security-relevant) information to flow logs enables cross-referencing with local hosts.

In our talk, we will discuss these changes in the context of prototypes implemented with USC-ISI's Merge Flow Collector, a NetFlow collection tool developed for running experiments on Merge testbeds. We will discuss the strategies we have explored, the general metric of footprint reduction to test efficiency, and plans for future work in this area.

Attendees Will Learn

  • Distinctions between measurement and forensic NetFlow, as well as activities going on in both categories
  • Potential mechanisms for modifying NetFlow to reduce the data footprint and improve cross-correlation with other data sources
  • How to identify ways to reduce footprint and by doing so, improve response time