Adversarial Examples and Adversarial Training
Many machine learning models are vulnerable to "adversarial examples"---inputs that are intentionally designed to cause the model to produce the wrong output. These inputs are often so subtle that a human observer cannot see that anything has been altered. Because adversarial examples that fool one machine learning model often fool another, an attacker can construct them without access to the target model. Explicitly training models to defend against adversarial attack is a partially effective defense strategy, and can also improve the performance of the model on naturally occurring data.