search menu icon-carat-right cmu-wordmark

SEI Launches Mothra for Big-Data Network Flow Analysis

SEI Launches Mothra for Big-Data Network Flow Analysis
Article

March 14, 2022—The SEI’s CERT Division has released Mothra, a collection of open source libraries for working with network flow data in the Apache Spark large-scale data analytics engine. Mothra version 1.5.1, released in February, follows the collection’s initial release in early January.

Mothra bridges the previously stand-alone tools of the CERT Network Situational Awareness (NetSA) Security Suite and Spark. Other security solutions, such as antivirus applications or intrusion detection and prevention systems, can also export data to Spark. Mothra enables analysts to access network flow data alongside these other sources, all within a common big-data analysis environment. With all these data sources available for analysis, organizations with very large networks can get much fuller network situational awareness.

NetSA’s long-standing SiLK (System for Internet-Level Knowledge) tool also analyzes network flow records, specifically those produced by the NetSA tool YAF (Yet Another Flowmeter). Organizations have used SiLK to spot security threats such as malware by collecting and analyzing metadata from large, distributed enterprise networks.

In the 18 years since SiLK’s release, network traffic has increased by more than a factor of 100. Even when compressed, large datasets of flow records can be too big to analyze efficiently in SiLK. “When you're looking at billions of records, that adds up and begins to be a problem,” said Katherine Prevost, a senior developer on the NetSA team.

Flow records have also grown in file size. Detecting today’s most concerning attacks, such as phishing, drive-by downloads, and ransomware, requires deep packet inspection (DPI). The process extracts more information on a flow’s security-critical components, but it also generates a record at least five times bigger than a non-DPI flow record. YAF can collect DPI information, but SiLK was not designed to analyze it or the volume of flow data generated by organizations at the scale of Internet service providers.

It was a big-data problem, and in 2017, a government sponsor asked the NetSA team to make YAF work with a big-data analysis tool. Rather than design their own tool, the NetSA team created Mothra to transform YAF output into a format readable by Apache Spark.

Mothra directly processes the binary IPFIX (Internet Protocol Flow Information Export) format, a standard of the Internet Engineering Task Force (IETF). “Data is laid out in a specific way,” explained Prevost, “so you can efficiently pull out just the pieces you want,” much like the sections of a physical library of books. Analysts can then use the Spark analysis engine on the IPFIX data. “Mothra lets you just drop the data right in without having think ahead about how to transform it,” said Prevost.

These transformations change the collected data as little as possible, preserving it for future analysis.

Analysts can now bring the programming power of Spark to bear on network flow data from the NetSA Security Suite. SiLK’s filters allow limited queries on pure flow datasets. Mothra and Spark enable much deeper, flexible queries over DPI-enriched flow to find much more data of interest. “You’re free to do any kind of data pull you can express as a program,” said Tim Shimeall, a senior analyst on the NetSA team. “You can do iterative pulls, where what you pull changes across the iterations. You can pull data that consists of packets bigger than the average number of packets within the matching set of criteria. Something that would take you a lot of scripting in SiLK can now be condensed down to a half page of code.”

Analysis of all that flow data requires plenty of storage as well as programming expertise. Mothra enables organizations with the infrastructure and personnel to support Apache Spark to use their expertise to apply DPI analytics to network flow data. This insight can help them evaluate their current defenses and discover security gaps, especially on infrastructure-level enterprise networks.

The release of Mothra 1.5.1 is the result of five years of building, testing, and tweaking by the NetSA team, and the tool continues to evolve. Mothra is currently compatible with Apache Spark version 2, and the team expects to have it support version 3 by summer. The CERT Division is currently writing developer-oriented documentation and analyst-oriented documentation with starting points for analytics. The team anticipates releasing task-oriented analyst training this fall.

Network flow expertise in both the development and analysis domains is rare, Shimeall noted. At FloCon, the SEI’s annual conference on using data to defend networks, presentations combining analysis and tool development are often produced by groups of collaborators across multiple organizations. “There aren’t that many organizations with the expertise to say, What's a reasonable traffic analysis environment? How do you build and architect it?” said Shimeall. “The SEI has skilled developers and analysts who work together on the security suite. The release of Mothra should also strengthen our vibrant connections to the developer and analyst communities.”

Download Mothra, other tools in the NetSA Security Suite, and documentation from the CERT NetSA Security Suite site. For more information on these tools, contact the NetSA team info@sei.cmu.edu.