search menu icon-carat-right cmu-wordmark

Bayes at 10 Gbps: Identifying Malicious and Vulnerable Processes from Passive Traffic Fingerprinting

Video
Watch David McGrew deliver the "Bayes at 10 Gbps" presentation at FloCon 2020.
Publisher

Software Engineering Institute

Watch

Abstract

This presentation describes an inferencing system and its implementation, results in applying it to real-world traffic, and open issues in this technology area.

As network monitoring techniques have evolved in response to the rise of encrypted traffic, protocol fingerprinting has become an essential component of network defense. While exact-match fingerprinting of TLS clients is now widespread, it is too imprecise to use for process identification. To more reliably determine the process associated with a session, we applied inferencing based on naïve Bayes to fingerprints and destination information, using equivalence classes of destinations derived from auxiliary data. Our implementation of the packet capture and inferencing uses Linux TPACKETv3 and can identify processes on 10+ Gbps enterprise internet connections. This system detects many interesting categories of processes, including malware, evasive applications, scanners, and obsolete and vulnerable software. As it is based on an interpretable machine learning model, its findings are readily understandable and it can adapt to different prior probabilities. In this presentation, we describe our inferencing system and its implementation, our results in applying it to real-world traffic, and open issues in this technology area. We also review the data and open source software that we published to demonstrate this capability.