Privacy and Anonymity in a World of Interconnected Data

Arvind Narayanan


The new Web economy relies on the collection of personal data on an ever-increasing scale. Data is collected about our tastes, purchases, searches, browsing history, friendships and relationships, health history, genetics and so forth. The aggregated datasets are not stationary: they shared with advertisers, marketers and researchers for business reasons. Nor does each such dataset exist in isolation: it contains implicit or explicit references to other datasets. Unsurprisingly, this has led to a host of privacy issues.

In this talk, I survey the different types of data that are being collected and shared, and propose theoretical models for analyzing privacy in such datasets. Next, I discuss the subtle relationship between anonymity and privacy and present a few related techniques for de-anonymizing large datasets, accompanied by the results of experiments. Finally, I will touch upon the broader threats to privacy arising from these techniques and discuss possible solutions, which, out of necessity, will have a non-technological dimension.

Time and Place

14 October 2008 (Tuesday) at 1630 hrs
Gates 4B (opposite 490)