Classify but Verify: Adversarial Stylometry and Machine Learning in an Open World
In this talk, I will discuss my lab's work in the emerging field of adversarial stylometry and machine learning. Machine learning algorithms are increasingly being used in security and privacy domains, in areas that go beyond intrusion or spam detection. For example, in digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. While stylometry techniques can identify authors with high accuracy in non-adversarial scenarios, their accuracy is reduced to random guessing when faced with authors who intentionally obfuscate their writing style or attempt to imitate that of another author..
We have applied stylometry to difficult domains such as underground hacker forums, open source projects (code), and tweets. We have developed a tool, Anonymouth, to help users understand their vulnerability to stylometric analysis and change their writing style. I will also discuss the open world problem, common to many security problems, where there is uncertainty about classes (suspects in the stylometry case) and how we can adapt methods to handle this case.