Algorithms and Architectures for Data Privacy

Full textClick to download.
CitationPh.D. Dissertation, Stanford University, 2007.
AuthorDilys Thomas


The explosive growth in networking, storage, and processor technologies has resulted in an unprecedented volume of digital data. With this increase in digital data, concerns about privacy of personal information have emerged. The ease with which data can be collected, stored in databases and queried efficiently over the internet has worsened the privacy situation, and has raised numerous ethical and legal concerns. Privacy enforcement today is handled primarily through legislation. We aim to provide technological solutions to achieve a tradeoff between data privacy and data utility. We focus on three problems in the area of data privacy in this thesis.

The first problem is that of data sanitization before publication. Publishing health and financial information for research purposes requires the data be anonymized so that the privacy of individuals in the database is protected. This anonymized information can be (1) used as is or (2)can be combined with another (anonymized) dataset that shares columns or rows with the original anonymized dataset. We explore both these sub-problems in this thesis. Another reason for sanitization is to give the data to an outsourced software developer learning information about its client. We briefly explain such a tool in this thesis.

The second part of the thesis studies auditing query logs for privacy. Given certain forbidden views of a database that must be kept confidential, a batch of SQL queries that were posed over this database, and a definition of suspiciousness, we study the problem to determine whether the batch of queries is suspicious with respect to the forbidden views.

The third part of the thesis deals with distributed architectures for data privacy. The advent of databases as an outsourced service has resulted in privacy concerns on the part of the client storing data with third party database service providers. Previous approaches to enabling such a service have been based on data encryption, causing a large overhead in query processing. In this thesis we provide a distributed architecture for secure database services. We develop algorithms for distributing data and executing queries over this distributed data.

Back to publications
Back to previous page