Blacklist Ecosystem Analysis Update: 2014

January 7, 2015 • White Paper

By

Leigh B. Metcalf and Jonathan Spring

This white paper compares the contents of 85 different Internet blacklists to discover patterns in shared entries.

Publisher

Software Engineering Institute

Topic or Tag

Situational Awareness

Abstract

This white paper compares the contents of 85 different Internet blacklists, also known as threat intelligence feeds or threat data feeds, to discover patterns in shared entries. It is an update to a 2013 report that compared 25 such Internet blacklists.

The methods and motivations of this report are similar to those employed in the earlier report. However, this update provides an expanded scope by increasing the number of lists and the duration of the investigation by another year. This report does not contain the same depth of detail as the 2013 report, especially where
details have not changed. Lists are compared directly and indirectly based on data type. Direct intersection comparison is straightforward; the list contents are compared temporally to determine if any list consistently published shared indicators before another list. Indirect comparison analyzes, for example, whether the existing intersection is random or has a pattern. These multiple methods indicate a range for how often a list provides an indicator with unique information and value to computer network defense (CND).

Domain-name-based indicators are unique to one list between 96.16% and 97.37% of the time. IP-address-based indicators are unique to one list between 82.46% and 95.24% of the time. These 2014 results support our 2013 results and conclusions, and are generally consistent. Namely, there is surprisingly little overlap between any two blacklists. Though there are exceptions to this pattern, the intersection between the lists remains low, even after expanding each list to a larger neighborhood of related indicators. Few lists consistently provide content before certain other lists, but more often there is no intersection at all. When there is an intersection, many times there is no pattern to which list came first.

These results suggest that each blacklist describes a distinct sort of malicious activity. The lists do not appear to converge on one version of all the malicious indicators for the Internet. Network defenders should be advised, therefore, to obtain and evaluate as many lists as practical, since it does not appear that any new list can be rejected out-of-hand as redundant. The results also indicate that there is no global ground truth to be acquired, no matter how many lists are merged. Therefore, the study supports the assertion that blacklisting is not a sufficient defense; an organization needs other defensive measures to add depth, such as gray listing, behavior analysis, criminal penalties, speed bumps, and organization-specific white lists.

This analysis provides a collective view of the whole ecosystem of blocking network touch points and blacklists. Many practitioners lament the fatigue of playing “whack-a-mole” against very resilient adversary resources. This tacit knowledge must be formalized before a better collective strategy can be enacted. The blacklist ecosystem supports this tacit knowledge and formalizes a part of it: since lists are largely distinct, “whack-a-mole” is inevitable and impossible. Without convergence, practitioners are left to do the best they can with the extensive but fragmentary blacklist data that is available.

Blacklist ecosystem analysis is one aspect of a larger body of work to quantify strategic cybersecurity issues. The blacklist ecosystem is intimately related to the low cost of domains and infrastructure to adversaries, the poor state of repair of consumer devices connected to the Internet that permits abuse, the challenges of modeling the interaction between the user and the adversary, and the challenges of designing effective and instructive observations in information security.

Software Engineering Institute