icon-carat-right menu search cmu-wordmark

NetFlow Data Cleaning and Feature Engineering

Presentation
Clarence Worrell and Tim Shimeall of the SEI presented this workshop at FloCon 2024.
Publisher

Software Engineering Institute

Topic or Tag

Abstract

In this workshop, we will explore how to process network flow data to meet the requirements of modeling methods applied in AI/ML and other analysis approaches. We will demonstrate, analyze, and explain the basic concepts of NetFlow data cleaning and feature engineering, include hands-on practice with the SiLK tool suite, and conclude with a demonstration of using machine learning to categorize network traffic by type.

NetFlow modeling and analysis requires that the data first be in clean and useful condition. Data cleaning involves sorting, de-duplication, and treating missing or corrupted values, for example. Feature engineering makes the data useful for analysis by aggregating records into relevant groups, transforming and encoding features into numerical values suitable for modeling, selecting the features most relevant to the analysis and dropping features that are least relevant, for example.

Intended Audience

All conference attendees with an interest in NetFlow data modeling and analysis are welcome to participate. Experience with analyzing network traffic data and with network security will be helpful. The hands-on exercises will be presented using SiLK and Python, and some basic familiarity with these tools will be helpful but not required.