Protection and the Control of Information Leakage in the Sage Machine Learning Platform

Roxana Geambasu


Machine learning (ML) models trained over sensitive user data can leak a lot about their training sets. As companies disseminate such models to untrusted domains -- such as end-user devices and wide-access model stores -- there is a growing need to control the leakage of the data through these models. I present Sage, an ML platform that enforces a global differential privacy (DP) guarantee across all models produced from a sensitive data stream. Sage extends Google's Tensorflow-Extended ML platform with novel mechanisms and DP theory to address operational challenges that arise from incorporating DP into ML training. First, to avoid the typical problem with DP systems of "running out of privacy budget" after a pre-established number of queries, we developed block composition, a new DP composition theory that leverages the time-bounded structure of training processes to keep training models endlessly on new data from a sensitive stream while enforcing event-level DP on the stream. Second, to control the quality of ML models produced by Sage, we developed privacy-adaptive training, a process that trains a model on increasing amounts of data from a stream until, with high probability, the model meets developer-configured quality criteria. With these methods and theory, Sage lets platform administrators control both the leakage of user data through disseminated models and the quality of these models.


Roxana Geambasu is an Associate Professor of Computer Science at Columbia University and a member of Columbia's Data Sciences Institute. She joined Columbia in Fall 2011 after finishing her Ph.D. at the University of Washington. For her work in cloud and mobile data privacy, she received: an Alfred P. Sloan Faculty Fellowship, an NSF CAREER award, a Microsoft Research Faculty Fellowship, several Google Faculty awards, a "Brilliant 10" Popular Science nomination, the Honorable Mention for the 2013 inaugural Dennis M. Ritchie Doctoral Dissertation Award, a William Chan Dissertation Award, two best paper awards at top systems conferences, and the first Google Ph.D. Fellowship in Cloud Computing.

Time and Place

Thursday, June 27, 4:15pm
Gates 358