Assessing Disclosure Risk in Anonymized Datasets

January 7, 2008 • White Paper

By
Alexi Kounine (EPFL) and Michele Bezzi (ATL)

In this paper, the authors propose a framework for estimating disclosure risk using conditional entropy between the original and the anonymized datasets.

Publisher

Software Engineering Institute

Topic or Tag

Abstract

Sharing of log data is a valuable step towards the improvement of network security. However, logs often contain sensitive information and organizations are hesitant to share them. Anonymization methods are used for increasing protection, lowering the disclosure risk to a level considered safe. Accordingly, a metric for anonymity is necessary to quantitatively assess the risk before releasing log data. In this paper, we propose a general framework for estimating disclosure risk using conditional entropy between the original and the anonymized datasets. We demonstrate our approach using network log files.

Software Engineering Institute