When Threat Hunting Fails: Identifying Malvertising Domains Using Lexical Clustering
Cisco Systems, Inc.
In this presentation, the authors introduce a real-time streaming pipeline built in Kafka to stem the initial attack that is observable in DNS logs by using a scalable clustering technique known as locality sensitive hashing (LSH) over the hostnames to identify the permutations of words and characters from “software”, “update”, “tech”, “support”, and more. We then discuss a novel belief propagation algorithm through a client-hostname bipartite graph that propagates up the related file hosts that lay behind malicious advertisements. Finally, we will disclose the anatomy of a malicious advertising campaign and uncover how the file hosts are often reused in malvertising campaigns.