NetFlow Data Cleaning and Feature Engineering
• Presentation
Publisher
Software Engineering Institute
Topic or Tag
Abstract
In this workshop, we will explore how to process network flow data to meet the requirements of modeling methods applied in AI/ML and other analysis approaches. We will demonstrate, analyze, and explain the basic concepts of NetFlow data cleaning and feature engineering, include hands-on practice with the SiLK tool suite, and conclude with a demonstration of using machine learning to categorize network traffic by type.
NetFlow modeling and analysis requires that the data first be in clean and useful condition. Data cleaning involves sorting, de-duplication, and treating missing or corrupted values, for example. Feature engineering makes the data useful for analysis by aggregating records into relevant groups, transforming and encoding features into numerical values suitable for modeling, selecting the features most relevant to the analysis and dropping features that are least relevant, for example.
Intended Audience
All conference attendees with an interest in NetFlow data modeling and analysis are welcome to participate. Experience with analyzing network traffic data and with network security will be helpful. The hands-on exercises will be presented using SiLK and Python, and some basic familiarity with these tools will be helpful but not required.