Machine Learning Attacks Against the Asirra CAPTCHA

Authors: Philippe Golle

Abstract:
The Asirra CAPTCHA [EDHS2007], proposed at ACM CCS 2007, relies on the problem of distinguishing images of cats and dogs (a task that humans are very good at). The security of Asirra is based on the presumed difficulty of classifying these images automatically.

In this paper, we describe a classifier which is 82.7% accurate in telling apart the images of cats and dogs used in Asirra. This classifier is a combination of support-vector machine classifiers trained on color and texture features extracted from images. Our classifier allows us to solve a 12-image Asirra challenge automatically with probability 10.3%. This probability of success is significantly higher than the estimate of 0.2% given in [EDHS2007] for machine vision attacks. Our results suggest caution against deploying Asirra without safeguards.

We also investigate the impact of our attacks on the partial credit and token bucket algorithms proposed in [EDHS2007]. The partial credit algorithm weakens Asirra considerably and we recommend against its use. The token bucket algorithm helps mitigate the impact of our attacks and allows Asirra to be deployed in a way that maintains an appealing balance between usability and security. One contribution of our work is to inform the choice of safeguard parameters in Asirra deployments.