Data Integrity Based Attacks in the New Era of Adversarial Data Science and Engineering

Eric Rozier


The Trustworthy Data Engineering Laboratory (TRUST Lab) has been working with the World Bank, the Federal Bureau of Investigation (FBI), the Environmental Protection Agency (EPA), and the City of Cincinnati to help solve a common problem faced by many organizations involved in data driven investigations: companies and entities that attempt to disguise malicious activities through attacks on the integrity of available data. In this talk we will explore the challenge of assuring data integrity in heterogenous data systems that face the challenges of velocity, variety, and volume that accompany the domain of Big Data. We will examine real case studies in debarrment and corruption in international procurement with the World Bank, cases of violations of the Resource Conservation and Recovery Act with the EPA, and human rights abuses of low income citizens by corporate slum-lords in the city of Cincinnati. In each of these cases we will show how malicious actors manipulated the data collection and data analytics process either through misinformation, abuse of regional corporate legal structures, collusion with state actors, or knowledge of underlying predictive analytics algorithms to damage the integrity of data used by machine learning and predictive analytic processes, or the outcomes derived from these processes, to avoid regulatory oversite, sanctions, and investigations launched by national and multi-national authorities. This new type of attack is growing increasingly common, and we will motivate and encourage increased research on counter measures and safe guards in information systems.

We will present the work of the TRUST Lab in building new systems for hierarchical classification of entities in common applications of these systems and show how through the utilization of semantic and syntactic information we can attempt to unify and detect malicious and non-malicious violations of data integrity. We will discuss strategies for combatting these attacks, and methods for hardening data collection and predictive analytics against future attacks without resorting to methods that compromise the open and unrestricted sharing of scientific methods and government transparency. We will review our work with OpenCorporates in expanding information collection and coverage on corporate entities to build an open system for defending against common attacks on data integrity. Included in our talk will be a demonstration of our new tool, CERCIS, for entity resolution and data integrity preservation, and a discussion of both our future work and the broader unanswered questions in this new and exciting frontier of cyberoperations and security.


Dr. Eric Rozier is an Assistant Professor of Electrical Engineering and Computing Systems and head of the Trustworthy Data Engineering Laboratory at the University of Cincinnati in Cincinnati, Ohio. Dr. Rozier has been a long time member of the IEEE, ACM, a member of the AIAA Intelligent Systems Technical Committee, and has been named a Frontier's of Engineering Education Faculty member by the National Academy of Engineering, a two time Eric and Wendy Schmidt Data Science for Social Good Faculty Fellow at the University of Chicago, and an IBM Research Fellow. Dr. Rozier's research interests include secure and dependable computing with a focus on critical infrastructures. Before joining the University of Cincinnati, Dr. Rozier was the founding director of the Fortinet Cybersecurity Laboratory at the University of Miami where he worked to develop and commercialize new technologies in homomorphic encryption for cloud-based systems. He earned his Ph.D. from the University of Illinois at Urbana-Champaign where he worked on applications in fault-tolerance and security with the National Center for Supercomputing Applications, and the Information Trust Institute.

Time and Place

Thursday, January 28, 4:15pm
Gates 498