search menu icon-carat-right cmu-wordmark

When Threat Hunting Fails: Identifying Malvertising Domains Using Lexical Clustering

In this presentation, the authors discuss the current malvertising threat landscape: ad networks, exchanges, exploits, and popular infection points.

Cisco Systems, Inc.


In this presentation, the authors introduce a real-time streaming pipeline built in Kafka to stem the initial attack that is observable in DNS logs by using a scalable clustering technique known as locality sensitive hashing (LSH) over the hostnames to identify the permutations of words and characters from “software”, “update”, “tech”, “support”, and more. We then discuss a novel belief propagation algorithm through a client-hostname bipartite graph that propagates up the related file hosts that lay behind malicious advertisements. Finally, we will disclose the anatomy of a malicious advertising campaign and uncover how the file hosts are often reused in malvertising campaigns.