Privacy Protection and Advertising in a Networked World.

Full textClick to download.
CitationPhD Thesis, Stanford University, 2005.
AuthorGagan Aggarwal


The last couple of decades have witnessed a phenomenal growth in the networking infrastructure connecting computers all over the world. The Internet has now become an ubiquitous channel for information sharing and dissemination. This has created a whole new set of research challenges, while giving a new spin on some existing ones. In this thesis, we address problems from two areas: (a) protection of data privacy, and (b) sale of advertisement space on Internet web sites.

The first part of the thesis deals with privacy issues involved in the exchange of information between non-trusting entities. The scenarios of interest include the Census Bureau publishing population statistics, hospitals sharing patient data with medical researchers, federal agencies sharing intelligence information with each other, a group of people wishing to find their common interests, and several universities trying to compute combined statistics about faculty salaries, among others. In most cases, exchange of a relevant synopisis or aggregate would be sufficient; however, in the absence of knowledge about what synopsis would be most relevant, the data tends to be disseminated in the raw. This threatens personal privacy and creates an opportunity for dishonest entities to misuse the information to further their own selfish agenda. We focus our attention on two abstract problems derived from this setting. We first consider the problem of anonymizing databases before dissemination, so as to safeguard the privacy of the individuals described by the databases. A solution to this problem can be used by the Census Bureau as well as hospitals to provide data containing personal information to interested parties without sacrificing privacy. We adopt the privacy framework of k-anonymity proposed by Samarati and Sweeney, and present approximation algorithms for anonymizing databases. The second problem we study is that of computing a function over data split between two or more non-trusting entities. The goal is to enable the entities to compute the value of the function without either entity having to reveal any unnecessary information to the other entity. In particular, we study the problem of computing statistics over data shared between multiple entities. We present efficient protocols for computing a fundamental statistical function, namely the k-th ranked element of a dataset split between multiple entities. These protocols for a practical solution to the problem of compiling faculty salary statistics covering several universities.

The pervasiveness of the Internet has fueled a rapid growth in advertising on Internet websites. In the second part of the thesis, we consider the problem of selling advertisement space on the Internet. Given the dynamic nature of the online advertising market - advertisers join and leave, the popularity of the web site changes over time, the value of an Internet user clicking on an advertisement link varies over time - auctions are the logical choice for selling advertisement space on web sites, and indeed, major search engines like Google and Yahoo! are using auctions to sell advertisement slots on their search result pages. However, a lack of understanding of good bidding strategies has kept marketers from fully embracing online advertising channels. One solution is to use selling mechanisms where the best strategy for advertisers is simple and well-understood. The class of truthful mechanisms has the property that the best strategy for any advertister is to bid an amount equal to her true valuation of the object she is bidding for. Since the use of truthful auction mechanisms considerably simplifies the task of bidding, we propose using truthful auctions for selling web advertisements. We study two different problem formulations in this setting. We first consider the problem of selling a single slot on a web page that gets a known number of hits per day, assuming that the number of visitors desired by each advertiser is known in advance. We present a truthful auction that is competitive with respect to an optimal omniscient pricing scheme that obeys a natural monotonicity property. The second problem we study is that of selling multiple advertisement slots on a web page. This problem is more closely aligned with the problem faced by search engines. We present an auction that is truthful when the advertisers are non budget-constrained. Moreover, under some reasonable assumptions, we show that its revenue is equivalent to the non-truthful auctions currently being used by search engines.

Back to publications
Back to previous page