The Differential Privacy Frontier

Cynthia Dwork, Microsoft Research

Abstract

How can a trusted curator of personal information reveal accurate statistics while simultaneously protecting individual privacy, even in the presence of arbitrary side information? Answering this question requires a compelling formalization of "privacy," sufficiently powerful to provide real protection and sufficiently flexible to be useful.

Differential privacy is a strong privacy guarantee for an individual's input to a randomized function. Informally, the guarantee says that the output of the function is essentially unchanged independent of whether any individual opts into, or opts out of, the data set. Designed for statistical analysis, for example, of health or census data, the definition protects the privacy of individuals, and small groups of individuals, while permitting very different outcomes in the case of very different data sets.

Despite the strength of the guarantee, differentially private algorithms with excellent utility have been developed for a host of statistical and datamining problems. At the same time, fruitful interplay with other fields has yielded new techniques for achieving differential privacy and broadened the scope of problems on which differential privacy has been brought to bear.

We begin by recalling some differential privacy basics. While the frontier of a vibrant area is always in flux, we will give a taste of the state of the art by surveying a handful of extremely recent advances in the field.