Privacy-Preserving Classification of Customer Data without Loss of Accuracy

Full textClick to download.
CitationIn Proc. of the 2005 SIAM International Conference on Data Mining (SDM)2005
AuthorsZhiqiang Yang
Sheng Zhong
Rebecca N. Wright


Privacy has become an increasingly important issue in data mining. In this paper, we consider a scenario in which a data miner surveys a large number of customers to learn classification rules on their data, while the sensitive attributes of these customers need to be protected. Solutions have been proposed to address this problem using randomization techniques. Such solutions exhibit a tradeoff of accuracy and privacy: the more each customer's private information is protected, the less accurate result the miner obtains; conversely, the more accurate the result, the less privacy for the customers. In this paper, we propose a simple cryptographic approach that is efficient even in a many-customer setting, provides strong privacy for each customer, and does not lose any accuracy as the cost of privacy. Our key technical contribution is a privacy-preserving method that allows a data miner to compute frequencies of values or tuples of values in the customers' data, without revealing the privacy-sensitive part of the data. Unlike general-purpose cryptographic protocols, this method requires no interaction between customers, and each customer only needs to send a single flow of communication to the data miner. However we are still able to ensure that nothing about the sensitive data beyond the desired frequencies is revealed to the data miner. To illustrate the power of our approach, we use our frequency mining computation to obtain a privacy-preserving naive Bayes classifier learning algorithm. Initial experimental results demonstrate the practical efficiency of our solution. We also suggest some other applications of privacy-preserving frequency mining.

Back to publications
Back to previous page