Full text | Click to download. |
Citation | Proceedings of the 11th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD), 2005.
|
Authors | Geetha Jagannathan
Rebecca N. Wright |
Advances in computer networking and database technologies have enabled
the collection and storage of vast quantites of data. Data mining can
extract valuable knowledge from this data, and organizations have
realized that they can often ovtain better results by pooling their
data together. However, the collected data may contain sensitive or
private information about the organizations or their customers, and
privacy concerns are exacerbated if data is shared between multiple
organizations.
Distributed data mining is concerned with the
computation of models from data that is distributed among multiple
participants. Privacy-preserving distributed data mining seeks to
allow for the cooperative computation of such models without the
cooperating parties revealing any of their in dividual data items. Our
paper makes two contributions in privacy-preserving data
mining. First, we introduce the concept of arbitrarily partitioned
data, which is a generalization of both horizontally and vertically
partitioned data. Second, we provide an efficient privacy-perserving
protocol for k-means clustering in the setting of arbitrarily
partitioned data.