Anonymity-Preserving Data Collection

Full textClick to download.
CitationProceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2005.
AuthorsZhiqiang Yang
Sheng Zhong
Rebecca N. Wright


Protection of privacy has become an important problem in data mining. In particular, individuals have become increasingly unwilling to share their data, frequently resulting in individuals either refusing to share their data or providing incorrect data. In turn, such problems in data collection can affect the success of data mining, which relies on sufficient amount of accurate data in order to produce meaningful results. random perturbation and randomized response techniques can provide some level of privacy in data collection, but they have an ssociated cost in accuracy. Cryptographic privacy-preserving data mining methods provide good privacy and accuracy properties. However, in order to be efficient, those solutions must be tailored to specific mining tasks, thereby losing generality.

In this paper, we propose efficient cyptographic techniques for online data collection in which data from a large number of respondents is collected anonymously, without the help of a trusted third party. that is, out solution allows the miner to collect the original data from each respondent, but in such a way that the miner cannot link a respondent's data to the respondent. An advantage of such a solution is that, because it does not change the actual data, its success does not depend on the underlying data mining problem. We provide proofs of the correctness and privacy of our solution, as well as experimental data that demonstrates its efficiency. We also extend our solution to tolerate certain kinds of malicious behavior of the participants.

Back to publications
Back to previous page