Full text | Click to download. |
Citation | Proceedings of the 11th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD), 2005.
|
Authors | Zhiqiang Yang
Sheng Zhong Rebecca N. Wright |
Protection of privacy has become an important problem in data
mining. In particular, individuals have become increasingly unwilling
to share their data, frequently resulting in individuals either
refusing to share their data or providing incorrect data. In turn,
such problems in data collection can affect the success of data
mining, which relies on sufficient amount of accurate data in order to
produce meaningful results. random perturbation and randomized
response techniques can provide some level of privacy in data
collection, but they have an ssociated cost in accuracy. Cryptographic
privacy-preserving data mining methods provide good privacy and
accuracy properties. However, in order to be efficient, those
solutions must be tailored to specific mining tasks, thereby losing
generality.
In this paper, we propose efficient cyptographic
techniques for online data collection in which data from a large
number of respondents is collected anonymously, without the help of a
trusted third party. that is, out solution allows the miner to collect
the original data from each respondent, but in such a way that the
miner cannot link a respondent's data to the respondent. An advantage
of such a solution is that, because it does not change the actual
data, its success does not depend on the underlying data mining
problem. We provide proofs of the correctness and privacy of our
solution, as well as experimental data that demonstrates its
efficiency. We also extend our solution to tolerate certain kinds of
malicious behavior of the participants.