search menu icon-carat-right cmu-wordmark

Blacklist Ecosystem Analysis

Hi all. Leigh Metcalf and I have been continuing our study of the cybersecurity ecosystem. Last year we published a long white paper telling you everything you wanted to know about blacklists. Turns out, that did not save the Internet on its own. We're extending that analysis with more blacklist ecosystem analysis this year.

Our results this year are largely consistent with those from last year. This past August, Ryan Trost made a Black Hat presentation that partly, and independently, confirms our past and current results. What we find is that most lists of malicious indicators, whether domains or IP addresses, are distinct from one another. Most, if not all, lists provide unique value to computer network defense (CND) because they contribute indicators no other list contains.

Our initial results followed a simple but naïve definition of unique: only one blacklist contained the indicator during the 12 months we investigated. Our goal was not just to confirm these findings by validating the same process, but also to extend it. Including the new report, we analyzed 30 months of data to confirm that, most of the time, indicators are unique. This new report covers March 16, 2013 to June 30, 2014. During that time, 82.46% of the over 121 million IP addresses listed and 96.16% of the over 30 million domain names listed were contained in only one blacklist. These results appear to indicate that the IP blacklist ecosystem is becoming even more unique because in CY2012, 66% of over 55 million IP address were unique to one list. Domains are holding steady since 96.6% of over 14 million domains were unique to one list in CY2012.

However, as we note in the reports, these estimates are very likely overestimates of the intersection of lists due to IP address reassignment and double counting as a result of Internet features such as NAT, DHCP, BGP, and IP address stewardship or assignment changes from regional Internet registries. We tried to account for this overestimate by assessing when a list is following another list and when the intersection of lists is happenstance. See the paper for how we did this, but our motivation for doing this was that if the intersection is happenstance, it is likely due to an Internet feature. However, if it is happenstance, then the information from each list is also likely valuable to CND at the time; if one list is just copying another, it is not valuable.

We expect that our "following" analysis underestimates the genuine number of uniquely useful indicators listed. However, since we created an overestimate with the naïve uniqueness and an underestimate with our "following" analysis, we can now supply a range for what we believe to be the genuine uniqueness of lists. Domain-name-based indicators are unique to one list between 96.16% and 97.37% of the time. IP-address- based indicators are unique to one list between 82.46% and 95.24% of the time.

This result is not encouraging for the CND status quo. It implies that most lists are a one-of-a-kind authority on a particular type of activity. However, each list maker may not know exactly how to describe the type of activity in which his/her list specializes. Further, if the list maker attempts to categorize a list as following a particular malicious activity, he/she will run into terminology and communication issues. The result also implies that existing blacklists should be used to examine new threats with caution. Investigations certainly cannot rely only on blacklists for the detection of ongoing activity.

Academics cannot use an arbitrary blacklist as a baseline to test their new CND methods for relevance. For practical CND, any one list (or any ten lists) cannot provide a comprehensive description of all malicious indicators. Every list the defender can obtain and use will probably continue to provide new, non-overlapping defense to the network.

A CND analyst or architect can also conclude that blacklists are insufficient for adequate network defense. If blocking is so fragile, it is too easy for attackers to evade. Other established CND methods should be prioritized and put into production as appropriate, such as gray lists, behavioral analysis, web proxy content analysis, and white lists.

If you have thoughts on how we should continue to investigate the blacklist ecosystem, please let us know. We'll both be at FloCon next week (Leigh has yet more new analyses, and I'm just the emcee), so you can also come talk to us about it there.

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed