Full text | Click to download. |
Citation | PhD Thesis, Stanford University, 2005.
|
Author | Gagan Aggarwal
|
The last couple of decades have witnessed a phenomenal growth in the
networking infrastructure connecting computers all over the world.
The Internet has now become an ubiquitous channel for information
sharing and dissemination. This has created a whole new set of
research challenges, while giving a new spin on some existing ones.
In this thesis, we address problems from two areas: (a) protection of
data privacy, and (b) sale of advertisement space on Internet web
sites.
The first part of the thesis deals with privacy issues involved in the
exchange of information between non-trusting entities. The scenarios
of interest include the Census Bureau publishing population
statistics, hospitals sharing patient data with medical researchers,
federal agencies sharing intelligence information with each other, a
group of people wishing to find their common interests, and several
universities trying to compute combined statistics about faculty
salaries, among others. In most cases, exchange of a relevant
synopisis or aggregate would be sufficient; however, in the absence of
knowledge about what synopsis would be most relevant, the data tends
to be disseminated in the raw. This threatens personal privacy and
creates an opportunity for dishonest entities to misuse the
information to further their own selfish agenda. We focus our
attention on two abstract problems derived from this setting. We
first consider the problem of anonymizing databases before
dissemination, so as to safeguard the privacy of the individuals
described by the databases. A solution to this problem can be used by
the Census Bureau as well as hospitals to provide data containing
personal information to interested parties without sacrificing
privacy. We adopt the privacy framework of k-anonymity proposed by
Samarati and Sweeney, and present approximation algorithms for
anonymizing databases. The second problem we study is that of
computing a function over data split between two or more non-trusting
entities. The goal is to enable the entities to compute the value of
the function without either entity having to reveal any unnecessary
information to the other entity. In particular, we study the problem
of computing statistics over data shared between multiple entities.
We present efficient protocols for computing a fundamental statistical
function, namely the k-th ranked element of a dataset split between
multiple entities. These protocols for a practical solution to the
problem of compiling faculty salary statistics covering several
universities.
The pervasiveness of the Internet has fueled a rapid growth in
advertising on Internet websites. In the second part of the thesis,
we consider the problem of selling advertisement space on the
Internet. Given the dynamic nature of the online advertising market -
advertisers join and leave, the popularity of the web site changes
over time, the value of an Internet user clicking on an advertisement
link varies over time - auctions are the logical choice for selling
advertisement space on web sites, and indeed, major search engines
like Google and Yahoo! are using auctions to sell advertisement slots
on their search result pages. However, a lack of understanding of
good bidding strategies has kept marketers from fully embracing online
advertising channels. One solution is to use selling mechanisms where
the best strategy for advertisers is simple and well-understood. The
class of truthful mechanisms has the property that the best strategy
for any advertister is to bid an amount equal to her true valuation of
the object she is bidding for. Since the use of truthful auction
mechanisms considerably simplifies the task of bidding, we propose
using truthful auctions for selling web advertisements. We study two
different problem formulations in this setting. We first consider the
problem of selling a single slot on a web page that gets a known
number of hits per day, assuming that the number of visitors desired
by each advertiser is known in advance. We present a truthful auction
that is competitive with respect to an optimal omniscient pricing
scheme that obeys a natural monotonicity property. The second problem
we study is that of selling multiple advertisement slots on a web
page. This problem is more closely aligned with the problem faced by
search engines. We present an auction that is truthful when the
advertisers are non budget-constrained. Moreover, under some
reasonable assumptions, we show that its revenue is equivalent to the
non-truthful auctions currently being used by search engines.