search menu icon-carat-right cmu-wordmark

A Subversive Use of SiLK

Hi, this is Leigh Metcalf. In this blog post I talk about a subversive use of SiLK, the open-source tool suite designed by the CERT/CC team at the SEI, available on the CERT website. This post is a technical walk through of how to use the SiLK tools to support analysis in interesting ways you may not have thought of.

The System for Internet Level Knowledge (SiLK) tool is an efficient network flow collection and storage infrastructure that accepts flow data from a variety of sensors. SiLK also provides a suite of efficient command-line tools for analysis. Recently we published a conference paper about some important uses and algorithms in SiLK, but today I'm I'm discussing some more, shall we say, creative uses involving IP sets.

SiLK contains functionality for IP sets, which conceptually are exactly what you'd expect: sets of IP addresses. A SiLK IP set is a binary file that contains IPv4 addresses and/or IPv6 addresses in a special, optimized format. IP sets are very useful (and fast), not just within SiLK, but independently. The tools used to work with IP sets are available independent of the rest of the SiLK tools if you don't want to install the whole tool suite.

Of course, IP addresses are not the only large sets of information we have to deal with. In network security, we often run across large sets of domain names, autonomous systems, and hashes of most any feature of a (malicious) file. We've proposed a notation to help track analytic interrelationships, but automating these interrelationships and working with such large sets is a technical challenge. But we can abuse SiLK IP sets to help!

An MD5 hash is very similar to an IPv6 address. Both are really just 128-bit numbers. The only difference is that the common "human-readable" representation of an IPv6 address has a ':' every 4 characters.

A simple sed command will add these colons to the MD5 hashes:

sed 's/..../&:/g;s/:$//'

Thus, you can build an IP set named "output.set" from a file of MD5 hash values ($filename) using this command:

sed 's/..../&:/g;s/:$//' $filename | rwsetbuild stdin output.set

It's that simple. Now you can do any set math, such as checking membership, union, intersect, or difference statistics on the set using the super-fast SiLK tools. The SiLK tools rwsettool, rwsetmember, and rwsetcat are your friends. To use rwsetmember you need to add the colons to the MD5 hash string before you use it. This change can be made using a simple command line trick:

rwsetmember `echo $md5 | sed 's/..../&:/g;s/:$//'` $setname.set

Where $md5 is the hash value you want to check, and $setname.set is the set in which you want to check membership. (If you're re-typing and not copying-pasting, note the back-ticks [`] are different from the single quotes ['].)

For rwsetcat, you can use the following command to view the output as MD5 values:

rwsetcat $setname.set | sed 's/://g'

SiLK sets are very useful as a tool by themselves, and I hope this blog post introduced you to a creative way to use them. You should now be able to save a good deal of computing time on set operations that use large numbers of MD5 hash values!

Have more subversive uses of the SiLK tools that you'd like to share? Contact me by sending a message to Leigh using our contact form.

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed