Abstract :
[en] Data publication is a simple and cost-effective approach for data sharing across organizations. Data anonymization is a central technique in privacy preserving data publications. Many methods have been proposed to anonymize individual datasets and multiple datasets of the same data publisher. In real life, a dataset is rarely isolated and two datasets published by two organizations may contain the records of the same individuals. For example, patients might have visited two hospitals for follow-up or specialized treatment regarding a disease, and their records are independently anonymized and published. Although each published dataset poses a small privacy risk, the intersection of two datasets may severely compromise the privacy of the individuals. The attack using the intersection of datasets published by different organizations is called a composition attack. Some research work has been done to study methods for anonymizing data to prevent a composition attack for independent data releases where one data publisher has no knowledge of records of another data publisher. In this chapter, we discuss two exemplar methods, a randomization based and a generalization based approaches, to mitigate risks of composition attacks. In the randomization method, noise is added to the original values to make it difficult for an adversary to pinpoint an individual’s record in a published dataset. In the generalization method, a group of records according to potentially identifiable attributes are generalized to the same so that individuals are indistinguishable. We discuss and experimentally demonstrate the strengths and weaknesses of both types of methods. We also present a mixed data publication framework where a small proportion of the records are managed and published centrally and other records are managed and published locally in different organizations to reduce the risk of the composition attack and improve the overall utility of the data.
Scopus citations®
without self-citations
4