Full text | Click to download. |
Citation | PhD thesis, Stevens Institute of Technology, 2007
|
Author | Zhiqiang Yang
|
With the rapid development of the Internet and computer technology,
more and more of our activities are carried out on the Internet.
Consequently, more and more data related to individuals are
collected and used by different parties who are generally
distributed over a wide variety of sites. Therefore, the protection
of data privacy in such distributed settings is drawing more
attention than ever.
In this thesis, we present five techniques for protecting data
privacy in different distributed settings by using cryptographic
tools. For each of our techniques, we formally give an appropriate
definition of data privacy. We also give thorough analysis to show
that data privacy is protected properly under certain settings.
We have considered the distributed setting where a data miner or
collector wants to collect data or learn models from (potentially)
large amount of online respondents, and we have designed three
privacy-preserving techniques in this setting. Our first technique
enables the data miner to learn certain classification models from
respondents' data without even seeing those data. Assuming there is
no identifiable information in respondents' data, our second
technique provides an efficient technique for data collection such
that the data miner can collect respondents' data without the
ability to link the data with each respondent. Considering that
respondents' data contain some identifiable information, we design
our third technique to enable the data miner to collect respondents'
data in a privacy-preserving manner.
After data are collected by different parties, those parties may
want to share their data in certain ways to benefit each other,
e.g., learning certain models from the combination of their data. To
protect data privacy in such settings, we then design our fourth
privacy-preserving technique for a particular data mining task:
learning a Bayesian network from a database vertically partitioned
among two parties. In this setting, two parties owning confidential
databases wish to learn a Bayesian network from the combination of
their databases without revealing anything else about their data to
each other. By using cryptographic techniques, we present efficient
and privacy-preserving protocols to construct a Bayesian network on
the parties' joint data.
Once the data are collected by different parties, generally those
data are stored and maintained in databases. Encryption is a
powerful tool to protect data. However, when data are encrypted,
performing queries becomes more challenging. To solve this problem,
we study efficient methods for queries on encrypted data.
Specifically, we show that even if an intruder breaks into the
database and observes some interactions between the database and its
users, he learns very little about the data stored in the database
and the queries performed on the data. In addition to proving
security guarantees formally, we provide empirical data for
performance evaluations.
Overall, we have provided five techniques in the distributed setting
to protect data privacy using cryptographic tools. Our techniques
show that data privacy can be properly protected in distributed
settings. Our experimental results further demonstrate that our
techniques are very efficient and can be deployed across large
scales.