Adversarial Examples in Machine Learning

Nicolas Papernot

Abstract:

Machine learning models, including deep neural networks, were shown to be vulnerable to adversarial examples--subtly (and often humanly indistinguishably) modified malicious inputs crafted to compromise the integrity of their outputs. Adversarial examples thus enable adversaries to manipulate system behaviors. Potential attacks include attempts to control the behavior of vehicles, have spam content identified as legitimate content, or have malware identified as legitimate software. In fact, the feasibility of misclassification attacks based on adversarial examples has been shown for image, text, and malware classifiers.

Furthermore, adversarial examples that affect one model often affect another model, even if the two models have different architectures (based on a neural network, support vector machine, or nearest neighbor for instance) or were trained on different training sets, so long as both models were trained to perform the same task. An attacker may therefore train their own substitute model, craft adversarial examples against the substitute, and transfer them to a victim model, with very little information about the victim: the adversarial examples crafted using the substitute are also largely misclassified by the victim. The attacker need not even collect a training set to mount the attack, as a technique demonstrated how adversaries may use the victim model as an oracle to label a synthetic training set for the substitute. This effectively enables them to target remotely hosted victim classifiers with very little adversarial knowledge.

This talk covers adversarial example crafting algorithms operating under varying threat models and application domains, as well as defenses proposed to mitigate such attacks.

Time and Place

Thursday, October 13, 4:15pm
Gates 463