Genome Privacy

Patient genomes are typically interpretable only in the context of other genomes. However, genome sharing opens individuals up to possible discrimination and identification. Some of my research has focused on developing cryptographic methods to protect the privacy of a patient's genome while still enabling useful computations across multiple genomes.

Deriving Genomic Diagnoses Without Revealing Patient Genomes

Karthik A. Jagadeesh, David J. Wu, Johannes A. Birgmeier, Dan Boneh, and Gill Bejerano

Abstract:

Patient genomes are interpretable only in the context of other genomes; however, genome sharing enables discrimination. Thousands of monogenic diseases have yielded definitive genomic diagnoses and potential gene therapy targets. Here we show how to provide such diagnoses while preserving participant privacy through the use of secure multiparty computation. In multiple real scenarios (small patient cohorts, trio analysis, two-hospital collaboration), we used our methods to identify the causal variant and discover previously unrecognized disease genes and variants while keeping up to 99.7% of all participants’ most sensitive genomic information private.

Resources:

Press Coverage:

BibTeX:
@article{JWBBB17,
  author     = {Karthik A. Jagadeesh and David J. Wu and Johannes A. Birgmeier and Dan Boneh and Gill Bejerano},
  title      = {Deriving Genomic Diagnoses Without Revealing Patient Genomes},
  journal    = {Science},
  volume     = {357},
  number     = {6352},
  pages      = {692--695},
  year       = {2017}
}

Secure Genome-Wide Association Analysis using Multiparty Computation

Hyunghoon Cho, David J. Wu, and Bonnie Berger

Abstract:

Most sequenced genomes are currently stored in strict access-controlled repositories. Free access to these data could improve the power of genome-wide association studies (GWAS) to identify disease-causing genetic variants and aid the discovery of new drug targets. However, concerns over genetic data privacy may deter individuals from contributing their genomes to scientific studies and could prevent researchers from sharing data with the scientific community. Although cryptographic techniques for secure data analysis exist, none scales to computationally intensive analyses, such as GWAS. Here we describe a protocol for large-scale genome-wide analysis that facilitates quality control and population stratification correction in 9K, 13K, and 23K individuals while maintaining the confidentiality of underlying genotypes and phenotypes. We show the protocol could feasibly scale to a million individuals. This approach may help to make currently restricted data available to the scientific community and could potentially enable secure genome crowdsourcing, allowing individuals to contribute their genomes to a study without compromising their privacy.

Resources:

Press Coverage:

BibTeX:
@article{CWB18,
  author     = {Hyunghoon Cho and David J. Wu and Bonnie Berger},
  title      = {Secure Genome-Wide Association Analysis using Multiparty Computation},
  journal    = {Nature Biotechnology},
  volume     = {36},
  number     = {6},
  pages      = {547--551},
  year       = {2018}
}

Avoiding Genetic Racial Profiling in Criminal DNA Profile Databases

Jacob A. Blindenbach, Karthik A. Jagadeesh, Gill Bejerano, and David J. Wu

Abstract:

DNA profiling has become an essential tool for crime solving and prevention, and CODIS (Combined DNA Index System) criminal investigation databases have flourished at the national, state and even local level. However, reports suggest that the DNA profiles of all suspects searched in these databases are often retained, which could result in racial profiling. Here, we devise an approach to both enable broad DNA profile searches and preserve exonerated citizens’ privacy through a real-time privacy-preserving procedure to query CODIS databases. Using our approach, an agent can privately and efficiently query a suspect’s DNA profile device in the field, learning only whether the profile matches against any database profile. More importantly, the central database learns nothing about the queried profile, and thus cannot retain it. Our approach paves the way to implement privacy-preserving DNA profile searching in CODIS databases and any CODIS-like system.

Resources:

Press Coverage:

BibTeX:
@article{BJBW21,
  author  = {Jacob A. Blindenbach and Karthik A. Jagadeesh and Gill Bejerano and David J. Wu},
  title   = {Avoiding Genetic Racial Profiling in Criminal {DNA} Profile Databases},
  journal = {Nature Computational Science},
  volume  = {1},
  number  = {4},
  pages   = {272--279},
  year    = {2021}
}

Keeping Patient Phenotypes and Genotypes Private while Seeking Disease Diagnoses

Karthik A. Jagadeesh, David J. Wu, Johannes A. Birgmeier, Dan Boneh, and Gill Bejerano

Abstract:

In an age where commercial entities are allowed to collect and directly profit from large amounts of private information, an age where large data breaches of such organizations are discovered every month, science must strive to offer society viable ways to preserve privacy while benefitting from the power of data sharing. Patient phenotypes and genotypes are critical for building groups of phenotypically-similar patients, identify the gene that best explains their common phenotypes, and ultimately, diagnose a patient with a Mendelian disease. Direct computation over these quantities requires highly-sensitive patient data to be shared openly, compromising patient privacy and opening patients up for discrimination. Existing protocols focus on secure computation over genotype data and only address the final steps of the disease-diagnosis pipeline where phenotypically-similar patients have been identified. However, identifying such patients in a secure and private manner remains open. In this work, we develop secure protocols to maintain patient privacy while computing meaningful operations over both genotypic and phenotypic data for two real scenarios: COHORT DISCOVERY and GENE PRIORITIZATION. Our protocols newly enable a complete and secure end-to-end disease diagnosis pipeline that protects sensitive patient phenotypic and genotypic data.

Resources:

BibTeX:
@article{JWBBB19,
  author  = {Karthik A. Jagadeesh and David J. Wu and Johannes A. Birgmeier and Dan Boneh and Gill Bejerano},
  title   = {Keeping Patient Phenotypes and Genotypes Private
             while Seeking Disease Diagnoses},
  misc    = {Full version available at
             \url{https://biorxiv.org/content/biorxiv/early/2019/08/24/746230}},
  journal = {bioRxiv},
  year    = {2019}
}