Full text | Click to download. |
Citation | Ph.D. Dissertation, Stanford University, 2007.
|
Author | Dilys Thomas
|
The explosive growth in networking, storage, and processor technologies
has resulted in an unprecedented volume of digital data. With this increase
in digital data, concerns about privacy of personal information have
emerged. The ease with which data can be collected, stored in databases
and queried efficiently over the internet has worsened the privacy situation,
and has raised numerous ethical and legal concerns. Privacy enforcement
today is handled primarily through legislation. We aim to provide
technological solutions to achieve a tradeoff between data privacy and
data utility. We focus on three problems in the area of data privacy in
this thesis.
The first problem is that of data sanitization before publication. Publishing
health and financial information for research purposes requires the data
be anonymized so that the privacy of individuals in the database is
protected. This anonymized information can be (1) used as is or (2)can
be combined with another (anonymized) dataset that shares columns or rows
with the original anonymized dataset. We explore both these sub-problems
in this thesis. Another reason for sanitization is to give the data to an
outsourced software developer learning information about its client.
We briefly explain such a tool in this thesis.
The second part of the thesis studies auditing query logs for privacy.
Given certain forbidden views of a database that must be kept confidential,
a batch of SQL queries that were posed over this database, and a definition
of suspiciousness, we study the problem to determine whether the batch
of queries is suspicious with respect to the forbidden views.
The third part of the thesis deals with distributed architectures for
data privacy. The advent of databases as an outsourced service has resulted
in privacy concerns on the part of the client storing data with third
party database service providers. Previous approaches to enabling such
a service have been based on data encryption, causing a large overhead
in query processing. In this thesis we provide a distributed architecture
for secure database services. We develop algorithms for distributing
data and executing queries over this distributed data.