Distributed Protocols for Data Privacy

Full textClick to download.
CitationPhD thesis, Stevens Institute of Technology, 2007
AuthorZhiqiang Yang


With the rapid development of the Internet and computer technology, more and more of our activities are carried out on the Internet. Consequently, more and more data related to individuals are collected and used by different parties who are generally distributed over a wide variety of sites. Therefore, the protection of data privacy in such distributed settings is drawing more attention than ever.

In this thesis, we present five techniques for protecting data privacy in different distributed settings by using cryptographic tools. For each of our techniques, we formally give an appropriate definition of data privacy. We also give thorough analysis to show that data privacy is protected properly under certain settings. We have considered the distributed setting where a data miner or collector wants to collect data or learn models from (potentially) large amount of online respondents, and we have designed three privacy-preserving techniques in this setting. Our first technique enables the data miner to learn certain classification models from respondents' data without even seeing those data. Assuming there is no identifiable information in respondents' data, our second technique provides an efficient technique for data collection such that the data miner can collect respondents' data without the ability to link the data with each respondent. Considering that respondents' data contain some identifiable information, we design our third technique to enable the data miner to collect respondents' data in a privacy-preserving manner.

After data are collected by different parties, those parties may want to share their data in certain ways to benefit each other, e.g., learning certain models from the combination of their data. To protect data privacy in such settings, we then design our fourth privacy-preserving technique for a particular data mining task: learning a Bayesian network from a database vertically partitioned among two parties. In this setting, two parties owning confidential databases wish to learn a Bayesian network from the combination of their databases without revealing anything else about their data to each other. By using cryptographic techniques, we present efficient and privacy-preserving protocols to construct a Bayesian network on the parties' joint data.

Once the data are collected by different parties, generally those data are stored and maintained in databases. Encryption is a powerful tool to protect data. However, when data are encrypted, performing queries becomes more challenging. To solve this problem, we study efficient methods for queries on encrypted data. Specifically, we show that even if an intruder breaks into the database and observes some interactions between the database and its users, he learns very little about the data stored in the database and the queries performed on the data. In addition to proving security guarantees formally, we provide empirical data for performance evaluations. Overall, we have provided five techniques in the distributed setting to protect data privacy using cryptographic tools. Our techniques show that data privacy can be properly protected in distributed settings. Our experimental results further demonstrate that our techniques are very efficient and can be deployed across large scales.

Back to publications
Back to previous page