Monday, April 14, 2008

Some thoughts on CAPTCHAs...

First, in case your cave doesn't get Internet, let me define CAPTCHA from Wikipedia...

A CAPTCHA (IPA: /ˈkæptʃə/) is a type of challenge-response test used in computing to determine that the user is not run by a computer. The process involves one computer (a server) asking a user to complete a simple test which the computer is able to generate and grade. Because other computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. A common type of CAPTCHA requires that the user type the letters of a distorted image, sometimes with the addition of an obscured sequence of letters or digits that appears on the screen.

The term "CAPTCHA" was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper (all of Carnegie Mellon University), and John Langford (then of IBM). It is a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart", trademarked by Carnegie Mellon University.

Now, let's think about this from a security perspective. I'll start by stating the obvious - no one in modern computing (anyone who understands computing power as of Jan 1, 2008) will advocate using a CAPTCHA as a method of authentication... ever. There have been a series of discussions, blog posts and articles about how CAPTCHA is dead, and shouldn't be used, and even some pay-for CAPTCHA hacking services!... see references here, here and here. With all that going against CAPTCHA, let's dissect some of the lunacy, and make some sane assessments of this anti-bot technology.

Let me state that I firmly believe that the delta between humans and artificial intelligence (the ability for a computer to "think") is quickly closing with the exponential increase in processing power, network bandwidth and ingenuity. That being said, I'm not entirely convinced that CAPTCHAs are being solved by super-smart software which amounts to tweaked OCR code. Once you've looked at sites like CAPTCHAsolver, and talked to some people who rely on broken CAPTCHAs for their daily bread, you'll come to the conclusion I did - it's just cheaper to have someone else solve the CAPTCHA for you, and pay them for it, or reward them somehow than it is to try and build super-leet OCR software. There are a number of schemes to break large-scale CAPTCHA implementations, mostly involving college kids, or "work-from-home" schemes, or access to porn sites out there and the reason for their existence is simple - money. If you use some of the math that powers the spamming world - for every million emails you sent out you get back something like $100 in revenue, so that means one thing... you must send out millions and millions of emails. The problem with that is that email servers and accounts quickly get black-listed so the spammers have turned to some of the free email account providers out there. In order to counter-act this these issues these providers put up CAPTCHAs on the sign-up pages so that the idea was in order to sign up for a free webmail account you had to be a "human" or at least be able to type back the characters that were in the scrambled window displayed to you. As evolution would dictate; the provider built a better mouse-trap and the spammers built a better mouse... so now you have this perpetual war that is being waged between providers and spammers over CAPTCHAS while the security world ponders their purpose.

Let me make it simple - this is going to be a race that the "good guys" will never win. After all, they have a finite amount of time and effort they can put into this while the "bad guys" can spend all day/week/month breaking these and are money-driven and will eventually succeed. So the point then is - don't use CAPTCHAs as anything more then a small test to weed-out the obvious computer-driven spammers. Know that you will be defeated and cannot possibly win. And over-all... understand that a CAPTCHA is *not* a security measure... it's just a (flawed) way to determine the difference between a human and a computer.

Am I out in left field here?

Oh! I almost forgot, if you really want to read more about this and the statistics of breaking CAPTCHAs, give this site a read, it's a wonderful resource! More at PWNtcha - captcha decoder website.


Jane said...

Thanks for the great post!

I'm sort of anal over accessibility issues, and knowing a fair number of blind people, have to deal with captchas too often. Just on the basis of inaccessibility alone they should not be used..unfortunately they're everywhere and people don't think about what happens when someone can't read a captcha for whatever reason...or believe that it can keep out spammers. Given how much's been in the media over these various issues I'm surprised they're still being used, but I suppose it's one of the easiest ways to keep out automated spam on sites like blogs and whatever - the ones not worth the effort (as opposed to, say, what you mention about free email services).

Ironically I see that a captcha is required to post a comment ;)

CheetahTech said...

think about Javascript Captcha's? Since crawlers don't use javascript, it works. But more importantly, I guess it wouldn't be internet security.

Rafal said...

[author comment]
To address JavaScript CAPTCHAs... no, traditional crawlers don't generally support JS processing, but again, if we build a better mouse-trap, the bad guys will build a better mouse - so we're trapped in an endless one-up game which is no-win for us.

Jane said...

@cheetahtech, what happens to people not using javascript? :p