Privacy-Preserving Systems

Functionality and user privacy are often in tension with each other, especially when it comes to modern data-driven and cloud-based applications. Much of my research is on leveraging cryptographic tools and techniques to provide a balance between the need for privacy and the need for functionality. Notable examples include designing efficient protocols for private information retrieval (PIR) and systems for privacy-preserving machine learning.

YPIR: High-Throughput Single-Server PIR with Silent Preprocessing

Samir Jordan Menon and David J. Wu

Abstract:

We introduce YPIR, a single-server private information retrieval (PIR) protocol that achieves high throughput (up to 75% of the memory bandwidth of the machine) without any offline communication. For retrieving a 1-bit (or 1-byte) record from a 32-GB database, YPIR achieves 10.9 GB/s/core server throughput and requires 2.5 MB of total communication. On the same setup, the state-of-the-art SimplePIR protocol achieves a 12.6 GB/s/core server throughput, requires 1.5 MB total communication, but additionally requires downloading a 724 MB hint in an offline phase. YPIR leverages a new lightweight technique to remove the hint from high-throughput single-server PIR schemes with small overhead. We also show how to reduce the server preprocessing time in the SimplePIR family of protocols by a factor of \( 10 \)–\( 15\times \).

By removing the need for offline communication, YPIR significantly reduces the server-side costs for private auditing of Certificate Transparency logs. Compared to the best previous PIR-based approach, YPIR reduces the server-side costs by a factor of \( 5.6\times \). Note that to reduce communication costs, the previous approach assumed that updates to the Certificate Transparency log servers occurred in weekly batches. Since there is no offline communication in YPIR, our approach allows clients to always audit the most recent Certificate Transparency logs (e.g., updating once a day). Supporting daily updates using the prior scheme would cost \( 30\times \) more than YPIR (based on current AWS compute costs).

Resources:

BibTeX:
@misc{MW24,
  author = {Samir Jordan Menon and David J. Wu},
  title  = {{YPIR}: High-Throughput Single-Server {PIR} with Silent Preprocessing},
  misc   = {Full version available at \url{https://eprint.iacr.org/2024/270}},
  year   = {2024}
}

Spiral: Fast, High-Rate Single-Server PIR via FHE Composition

Samir Jordan Menon and David J. Wu

Abstract:

We introduce the Spiral family of single-server private information retrieval (PIR) protocols. Spiral relies on a composition of two lattice-based homomorphic encryption schemes: the Regev encryption scheme and the Gentry-Sahai-Waters encryption scheme. We introduce new ciphertext translation techniques to convert between these two schemes and in doing so, enable new trade-offs in communication and computation. Across a broad range of database configurations, the basic version of Spiral simultaneously achieves at least a 4.5x reduction in query size, 1.5x reduction in response size, and 2x increase in server throughput compared to previous systems. A variant of our scheme, SpiralStreamPack, is optimized for the streaming setting and achieves a server throughput of 1.9 GB/s for databases with over a million records (compared to 200 MB/s for previous protocols) and a rate of 0.81 (compared to 0.24 for previous protocols). For streaming large records (e.g., a private video stream), we estimate the monetary cost of SpiralStreamPack to be only 1.9x greater than that of the no-privacy baseline where the client directly downloads the desired record.

Resources:

BibTeX:
@inproceedings{MW22,
  author     = {Samir Jordan Menon and David J. Wu},
  title      = {\textsc{Spiral}: Fast, High-Rate Single-Server {PIR} via {FHE} Composition},
  booktitle  = {{IEEE} {S\&P}},
  year       = {2022}
}

Authenticated Private Information Retrieval

Simone Colombo, Kirill Nikitin, Henry Corrigan-Gibbs, David J. Wu, and Bryan Ford

Abstract:

This paper introduces protocols for authenticated private information retrieval. These schemes enable a client to fetch a record from a remote database server such that (a) the server does not learn which record the client reads, and (b) the client either obtains the “authentic” record or detects server misbehavior and safely aborts. Both properties are crucial for many applications. Standard private-information-retrieval schemes either do not ensure this form of output authenticity, or they require multiple database replicas with an honest majority. In contrast, we offer multi-server schemes that protect security as long as at least one server is honest. Moreover, if the client can obtain a short digest of the database out of band, then our schemes require only a single server. Performing an authenticated private PGP-public-key lookup on an OpenPGP key server's database of 3.5 million keys (3 GiB), using two non-colluding servers, takes under 1.2 core-seconds of computation, essentially matching the time taken by unauthenticated private information retrieval. Our authenticated single-server schemes are 30-100\( \times \) more costly than state-of-the-art unauthenticated single-server schemes, though they achieve incomparably stronger integrity properties.

Resources:

BibTeX:
@inproceedings{CNCWF23,
  author    = {Simone Colombo and Kirill Nikitin and Henry Corrigan-Gibbs and David J. Wu and Bryan Ford},
  title     = {Authenticated Private Information Retrieval},
  booktitle = {{USENIX} Security Symposium},
  year      = {2023}
}

CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU

Sijun Tan, Brian Knott, Yuan Tian, and David J. Wu

Abstract:

We introduce CryptGPU, a system for privacy-preserving machine learning that implements all operations on the GPU (graphics processing unit). Just as GPUs played a pivotal role in the success of modern deep learning, they are also essential for realizing scalable privacy-preserving deep learning. In this work, we start by introducing a new interface to losslessly embed cryptographic operations over secret-shared values (in a discrete domain) into floating-point operations that can be processed by highly-optimized CUDA kernels for linear algebra. We then identify a sequence of “GPU-friendly” cryptographic protocols to enable privacy-preserving evaluation of both linear and non-linear operations on the GPU. Our microbenchmarks indicate that our private GPU-based convolution protocol is over 150x faster than the analogous CPU-based protocol; for non-linear operations like the ReLU activation function, our GPU-based protocol is around 10x faster than its CPU analog.

With CryptGPU, we support private inference and private training on convolutional neural networks with over 60 million parameters as well as handle large datasets like ImageNet. Compared to the previous state-of-the-art, when considering large models and datasets, our protocols achieve a 2x to 8x improvement in private inference and a 6x to 36x improvement for private training. Our work not only showcases the viability of performing secure multiparty computation (MPC) entirely on the GPU to enable fast privacy-preserving machine learning, but also highlights the importance of designing new MPC primitives that can take full advantage of the GPU's computing capabilities.

Resources:

BibTeX:
@inproceedings{TKTW21,
  author     = {Sijun Tan and Brian Knott and Yuan Tian and David J. Wu},
  title      = {\textsc{CryptGPU}: Fast Privacy-Preserving Machine Learning on the {GPU}},
  booktitle  = {{IEEE} {S\&P}},
  year       = {2021}
}

Privacy, Discovery, and Authentication for the Internet of Things

David J. Wu, Ankur Taly, Asim Shankar, and Dan Boneh

Abstract:

Automatic service discovery is essential to realizing the full potential of the Internet of Things (IoT). While discovery protocols like Multicast DNS, Apple AirDrop, and Bluetooth Low Energy have gained widespread adoption across both IoT and mobile devices, most of these protocols do not offer any form of privacy control for the service, and often leak sensitive information such as service type, device hostname, device owner's identity, and more in the clear.

To address the need for better privacy in both the IoT and the mobile landscape, we develop two protocols for private service discovery and private mutual authentication. Our protocols provide private and authentic service advertisements, zero round-trip (0-RTT) mutual authentication, and are provably secure in the Canetti-Krawczyk key-exchange model. In contrast to alternatives, our protocols are lightweight and require minimal modification to existing key-exchange protocols. We integrate our protocols into an existing open-source distributed applications framework, and provide benchmarks on multiple hardware platforms: Intel Edisons, Raspberry Pis, smartphones, laptops, and desktops. Finally, we discuss some privacy limitations of the Apple AirDrop protocol (a peer-to-peer file sharing mechanism) and show how to improve the privacy of Apple AirDrop using our private mutual authentication protocol.

Resources:

BibTeX:
@inproceedings{WTSB16,
  author    = {David J. Wu and Ankur Taly and Asim Shankar and Dan Boneh},
  title     = {Privacy, Discovery, and Authentication for the Internet of Things},
  booktitle = {European Symposium on Research in Computer Security ({ESORICS})},
  year      = {2016}
}

Privacy-Preserving Shortest Path Computation

David J. Wu, Joe Zimmerman, Jérémy Planul, and John C. Mitchell

Abstract:

Navigation is one of the most popular cloud computing services. But in virtually all cloud-based navigation systems, the client must reveal her location and destination to the cloud service provider in order to learn the fastest route. In this work, we present a cryptographic protocol for navigation on city streets that provides privacy for both the client's location and the service provider's routing data. Our key ingredient is a novel method for compressing the next-hop routing matrices in networks such as city street maps. Applying our compression method to the map of Los Angeles, for example, we achieve over tenfold reduction in the representation size. In conjunction with other cryptographic techniques, this compressed representation results in an efficient protocol suitable for fully-private real-time navigation on city streets. We demonstrate the practicality of our protocol by benchmarking it on real street map data for major cities such as San Francisco and Washington, D.C.

Resources:

BibTeX:
@inproceedings{WZPM16,
  author    = {David J. Wu and Joe Zimmerman and J{\'{e}}r{\'{e}}my Planul and
               John C. Mitchell},
  title     = {Privacy-Preserving Shortest Path Computation},
  booktitle = {Network and Distributed System Security Symposium ({NDSS})},
  year      = {2016}
}

Privately Evaluating Decision Trees and Random Forests

David J. Wu, Tony Feng, Michael Naehrig, and Kristin Lauter

Abstract:

Decision trees and random forests are common classifiers with widespread use. In this paper, we develop two protocols for privately evaluating decision trees and random forests. We operate in the standard two-party setting where the server holds a model (either a tree or a forest), and the client holds an input (a feature vector). At the conclusion of the protocol, the client learns only the model's output on its input and a few generic parameters concerning the model; the server learns nothing. The first protocol we develop provides security against semi-honest adversaries. We then give an extension of the semi-honest protocol that is robust against malicious adversaries. We implement both protocols and show that both variants are able to process trees with several hundred decision nodes in just a few seconds and a modest amount of bandwidth. Compared to previous semi-honest protocols for private decision tree evaluation, we demonstrate a tenfold improvement in computation and bandwidth.

Resources:

BibTeX:
@article{WFNL15,
  author  = {David J. Wu and Tony Feng and Michael Naehrig and Kristin Lauter},
  title   = {Privately Evaluating Decision Trees and Random Forests},
  journal = {Proceedings on Privacy Enhancing Technologies ({PoPETs})},
  volume  = {2016},
  number  = {4},
  pages   = {335--355},
  year    = {2016}
}

Private Database Queries using Somewhat Homomorphic Encryption

Dan Boneh, Craig Gentry, Shai Halevi, Frank Wang, and David J. Wu

Abstract:

In a private database query system, a client issues queries to a database and obtains the results without learning anything else about the database and without the server learning the query. While previous work has yielded systems that can efficiently support disjunction queries, performing conjunction queries privately remains an open problem. In this work, we show that using a polynomial encoding of the database enables efficient implementations of conjunction queries using somewhat homomorphic encryption. We describe a three-party protocol that supports efficient evaluation of conjunction queries. Then, we present two implementations of our protocol using Paillier's additively homomorphic system as well as Brakerski's somewhat homomorphic cryptosystem. Finally, we show that the additional homomorphic properties of the Brakerski cryptosystem allow us to handle queries involving several thousand elements over a million-record database in just a few minutes, far outperforming the implementation using the additively homomorphic system.

Resources:

BibTeX:
@inproceedings{BGHWW13,
  author    = {Dan Boneh and Craig Gentry and Shai Halevi and Frank Wang and
               David J. Wu},
  title     = {Private Database Queries Using Somewhat Homomorphic Encryption},
  booktitle = {Applied Cryptography and Network Security ({ACNS})},
  year      = {2013}
}