Privacy-Preserving Distributed k-Means Clustering over Arbitrarily Partitioned Data

Full textClick to download.
CitationProceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2005.
AuthorsGeetha Jagannathan
Rebecca N. Wright


Advances in computer networking and database technologies have enabled the collection and storage of vast quantites of data. Data mining can extract valuable knowledge from this data, and organizations have realized that they can often ovtain better results by pooling their data together. However, the collected data may contain sensitive or private information about the organizations or their customers, and privacy concerns are exacerbated if data is shared between multiple organizations.

Distributed data mining is concerned with the computation of models from data that is distributed among multiple participants. Privacy-preserving distributed data mining seeks to allow for the cooperative computation of such models without the cooperating parties revealing any of their in dividual data items. Our paper makes two contributions in privacy-preserving data mining. First, we introduce the concept of arbitrarily partitioned data, which is a generalization of both horizontally and vertically partitioned data. Second, we provide an efficient privacy-perserving protocol for k-means clustering in the setting of arbitrarily partitioned data.

Back to publications
Back to previous page