Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?
Google Businesses The Internet Security

Google's Audio CAPTCHA Falls To Automated Attack 145

SkiifGeek writes "Early in March, Wintercore Labs published proof of a generic approach to defeating audio CAPTCHAs, using Google's as the case study for their demonstration. With claims of over 90% success rate and expectations that this can be significantly improved with the right mix of filtering algorithms, the in-house tool remains unreleased. But it shouldn't take long for other developers to create their own tools and start targeting not only Google, but other sites that use audio CAPTCHAs for the vision-impaired. It isn't the first time that major sites (significantly major webmail providers) have had their CAPTCHAs broken, but it is the first reporting of defeating an audio CAPTCHA using a generic software approach. News about the discovery is slowly starting to spread."
This discussion has been archived. No new comments can be posted.

Google's Audio CAPTCHA Falls To Automated Attack

Comments Filter:
  • How long before they start saying the word over a background of static, jungle noises and beeping so that even the best trained of ears require three or four listens to decipher it?
    • Re: (Score:2, Interesting)

      by carlvlad ( 942493 )
      I hardly ever fail CAPTCHAs before, but ever since RapidShare implements their new CAPTCHAs it made me realized of how many more people suffered through annoyance of this. Kinda ironic though, it was supposed to weed out non-human. Reminds me of the Dilbert strip where PHB is considered the first human to fail the Turing Test.
    • by fbjon ( 692006 )
      If you listen to Google's captcha, you'll see that it is filled with nonsense voices as well as the real voice. You can still make out the real voice, but it's not entirely trivial. A great improvement, like TFA suggests, would be to use complete words rather than numbers, which turns it into a full voice-recognition problem for an attacker. Also, some manner of distortion in both time and frequency domain should thwart this attack. The only problem is that distorting in the frequency domain isn't all that
      • It's getting to the point where the spammers are solving real, previously unsolved problems with their spamming code. Perhaps this can be harnessed for the good "solve the following protein folding problem", "write a transcript for the following bit of audio" then we'll let you send 100 spam emails.
        • They don't have to do audio captchas where you type in directly what is said. They could require simple calculations or something like that to make it very hard for a computer to crack without sophisticated natural language processing.

          Enter the first letter of each word: Light Apples Meddle Blindly. (User enters: LAMB) Enter every other word: big white ben light. (User enters: "big ben" or "white light"). What is 14 plus 9? (User enters: 25)

          Add static and nonsense voices and these are all difficult t

        • by spud603 ( 832173 )

          It's getting to the point where the spammers are solving real, previously unsolved problems with their spamming code. Perhaps this can be harnessed for the good "solve the following protein folding problem", "write a transcript for the following bit of audio" then we'll let you send 100 spam emails.

          I think you're on to something. "factor this huge number and get a free spamming account for a week"
          only problem is you have to make the captchas that grandpa can solve be harder than the problems you give to the spammers.

      • I say, make it cognitave.

        What is the number that comes between 41 and 43?

        what do you get when you multiply 5x1?

        How many eggs are in a dozen?

        How much wood could a wood chuck chuck if Chuck the woodchuck could chuck wood?
        and if they don't type out exactly:
        Chuck would chuck as much wood as Chuck could if Chuck could chuck wood!
        Then the FBI automatically raids their house.
  • by Anonymous Coward
    It's more easier to detect a bot using audio captcha because a high number of simultaneous impaired users from a single IP is much less likely to happen than regular captcha.
    • Re: (Score:3, Insightful)

      by liquidpele ( 663430 )
      As if 400 tries in an hour with an 50% failure rate from one IP wouldn't throw flags with any type of captcha.... I really can't understand how these services can *not* see bots doing this, unless the bots are doing it at slow random intervals...
      • Re: (Score:2, Insightful)

        by Gavagai80 ( 1275204 )
        In the case of a high profile target like gmail, they're doing it from thousands of IPs in a botnet.
    • by Keichann ( 888574 ) on Friday May 02, 2008 @12:40PM (#23276296)
      If only somebody could distribute their bots into a kind of network? Then you'd get traffic arriving from all over the place, that would be significantly more difficult to detect!

      Quick, mod this post down, in case a neer-do-well were to get any ideas.
  • by revlayle ( 964221 ) on Friday May 02, 2008 @11:13AM (#23275046) Homepage
    some of the advanced IVR solutions (Interactive Voice Response... for like customer support or paying bills on the phone) can pick out numbers and words pretty well even under some noise conditions. so I am not totally surprised that this cracked the audio CAPTCHA.
    • I'd think it's easier to differentiate between known responses than pick out an arbitrary word though. What I mean is, in those IVR situations the software is usually just trying to differential between yes/no, accounts/support etc. The most advanced I've seen it is one where you could speak your credit card number, which is still just differentiating between a larger set (0-9).

      That was -going- to be my response as I assumed the audio CAPTCHA just played a recording of the word displayed in the normal CAPTCHA, but I just went and tried out google's and it does exactly what my credit card example describes except even shorter (6 digit number with background noise). So yeah... not that surprising.

      • If it was an arbitrary word, I could see additional difficulties then, of course... you would have to have speech-to-text technology that can distinguish words out of noise.
        • But then the human would also need to be able to spell.
          • True, Have someone tried the voice recognition integrated to Windows Vista? I tried it, and was really impressed. Speech to text exists for almost ten years now, so I'm not impressed with this news whatsoever...
            • I wonder how far advanced voice recognition for Mandarin Chinese is. My guess is that it is far behind what is available for English. This would mean that Chinese web sites are at an advantage with respect to word-based audio CAPTCHAs.
    • Re: (Score:3, Insightful)

      by Qzukk ( 229616 )
      IVR works as well as it does because it only has to understand numbers when it's expecting numbers and words when it's expecting words (and then only the words it expects to hear, try yelling "banana" at one). Also try calling your credit card company and telling it your card number is four quadrillion three hundred fifty-two trillion one hundred twelve billion five hundred forty-two million six hundred ninety-five thousand and one.

      If your audio captcha reads each letter one at a time, then your "IVR" only
  • by Half-pint HAL ( 718102 ) on Friday May 02, 2008 @11:16AM (#23275092)

    Right from the start it was clear that audio captchas were theoretically easier to break than visual ones.

    An image captcha is designed to require a mixture of perception and thought, but an audio one has to rely on pure perception, because it's temporary. You hear it then it's gone: you can't analyse it. This makes it infinitely less complicated that a video one.

    It's only because of low uptake that it's taken so long for a true proof-of-concept attack.


    • Re: (Score:2, Interesting)

      by firewrought ( 36952 )

      An image captcha is designed to require a mixture of perception and thought, but an audio one has to rely on pure perception, because it's temporary.

      I think your explanation is missing something, but I can't quite put my finger on what it is. Maybe it would be more accurate to say that audio captcha are simpler to process because (1) researches can't pump as much information thru the ears as they can thru the eyes [sensorary bandwidth is different] and (2) there's not a whole lot we can do to obfuscate a

      • I think your explanation is missing something, but I can't quite put my finger on what it is.

        OK, I'll be more brief:

        Audio captchas require on-line real-time processing by the human brain.

        Picture captchas can be processed off-line.

        Audio captchas therefore are harder to process, so effectively have to have a lower information bandwidth.

        The lower the information content, the less computer processing required to process it.

        Questions can never be culture-neutral, and any ability to cherry-pick questions

  • by snarfies ( 115214 ) on Friday May 02, 2008 @11:19AM (#23275140) Homepage
    "News about the discovery is slowly starting to spread."

    And, thanks to Slashdot, news about the discovery is now RAPIDLY spreading.
  • by Anonymous Coward
    do something else. show me a picture of an object and ask me (in a multiple-choice test?) what it is...a tree, a car, a house, a flower, whatever.

    and for the sight-impaired, how about a read description or definition of something? "this thing is the entrance to a house or a room" => door

    come on, webdesigner, it's not that hard to abandon those old and, above all, ANNOYING captchas
    • by mapkinase ( 958129 ) on Friday May 02, 2008 @12:25PM (#23276084) Homepage Journal
      Multiple choice are just silly. If there are 5 choices, in about ~5 tries the robot will pass the protected entrance.

      • Couldn't there be 20 choices, but only 3 shown? The bot would read the code and see 20 choices, but the human would only see 3 or 5 or whatever.
        • Or how's about, if it takes the user five tries to get past the question, the account gets locked out. I'm sorry but if you're that stupid, I don't feel sorry for you not being able to use the internet.
          • Wait, do you really mean account, or IP?

            If IP, then no luck. Bots jump IP's like crazy.

            If account (as in a login), then every person who gets their name used by a bot gets bitten. Given the ammount of email backscatter I've been getting lately from spammers using my email as a return address, that's certainly not something I look forward to.
          • The bots will get it right as often on the first try as on the fifth, but that's irrelevant since every try will come from a different IP.
        • I do not remember exactly but intuitively N choices are passed in N/e attempts, something like that. If you are ok with 15% captchas passed, then it is ok.

          Besides, human will see 3 or 5, and bot will see 20, 15 of which it will see as "hidden".
    • The answer is given to you in the question in a multiple choice test. One of the choices has to be the correct one, which means you can trivially bruteforce it.
    • I'm going to start asking my users riddles to validate themselves.

      "I am a news-for-nerds website whose domain name was intentionally selected to be confusing to laypeople. What am I?"
    • by Phroggy ( 441 )

      and for the sight-impaired, how about a read description or definition of something? "this thing is the entrance to a house or a room" => door

      I've been experimenting with this kind of thing; it's a lot harder than it sounds. Computers aren't very good at answering questions like that... but they're not very good at asking questions like that either. The problem is, you don't want a human to have to think up every single question, because that severely limits the number of possible answers, and when the number of possible answers is limited, it becomes possible to just pick one randomly.

      You need a way to automatically generate the questions by

  • So given that (I assume) all audio CAPTCHAs have the same problem (i.e., the numbers and clearer voices can easily be found using audio analysis), does that mean that all audio-based CAPTCHAs are bound to fail?
    • by Zerth ( 26112 )
      Not necessarily, humans are still much more adept at extracting voices from noise(e.g. conversations in crowded conventions) but I imagine people will quickly consider them almost as annoying as the worst of visual CAPTCHAs.
      • I can see a main problem with that: to ensure some degree of entropy, one would have to record enough CAPTCHAs to satisfy all possible combinations of the English alphabet. That's a lot! Even if that is the case, that is actually less secure than an automated audio CAPTCHA because, if anything, hackers can simply download all recorded CAPTCHAs and crack the systems that way.
  • I'm sorry it's come to this, but before you may log on, I'll need a 200 word essay on the virtues of Microsoft. Spelling and grammar will count against you, especially if they are perfect. That means either you are a machine or you need to lighten up. Did I mention the five minute time limit?

    Scary, isn't it?

    • by WK2 ( 1072560 )
      A CAPTCHA has to be completely automated. Grading an essay test would be hard to automate.
  • by sakdoctor ( 1087155 ) on Friday May 02, 2008 @11:25AM (#23275250) Homepage
    Apart from OCRing books, I can't think of anything else that is not a total waste of human time. How about meta-moderating as a CAPTCHA activity; probably too fuzzy to work to a reasonable degree of accuracy.

    Basically I think the arms race is already over, and a new paradigms is needed,
    • by mgblst ( 80109 )
      Classifying porn pictures. This is very useful, girl-on-girl, top half only, etc...

      Realistically, providing one word description for a bunch of pictures could be useful. I know google setup a "game" for this months ago.
  • CAPTCHA technology is going to have a very difficult time over the next few years. Finding tasks (which can be implemented on standard computer systems and transmitted over the internet) that are trivial for humans but exceedingly difficult for computers is going to be rough.

    This is especially true because the computer doesn't need a 100% success rate to effectively "break" the CAPTCHA. Heck, if the CAPTCHA gives you 3 tries before rejecting you, then a 30% success rate = fully broken.

    For right now, they
    • I like this idea. How about instead of the words "kitten piglet puppy toaster" you have images? A kitten can be drawn 1000s of ways so that the attacking computer would have to get a lot right to be successful: they have to correctly identify the thing in the picture and THEN answer a question about it. I think my grandma would have an easier time with simple questions about simple images than the current CAPTCHAs.
    • by dw604 ( 900995 )
      What is the third word in this sentence? What is the second letter in the first word of this sentence? The possibilities are limitless. Computers can't "think".
    • Problem with the 'rational' approach is that it isn't that simple. These problems have to be designed and implemented which takes time and money from the designers. Yes it is simple but not as simple as generating a random string which takes a one time code.

      If you only have a set list of rational problems then you're going to run into the problem of dedicated spammers who will simply create a method of cracking it based on previous results.
    • by sidb ( 530400 )
      The problem is that captchas have to be computer-generated on the fly. It's hard to think of things a computer can easily do in one direction, that a similar computer cannot undo, but that a human can easily undo. Relationship puzzles between words won't work because the attacking computer probably has dictionary resources very similar to the defending computer's.
  • Spam is already a pretty ethically dubious thing, but this should be viewed differently in the eyes of the law (in the event we actually catch somebody behind it in a 1st world country). Sort of how if you assualt an able bodied man on the street you'll be punished, but assault a grandma with a walker or a boy in a wheelchair, and you'll likely have the book thrown at you. Abusing handicapped accessiblity should really fall into the "boy in a wheelchair" category.

    You'd almost hope that the same sort of hono
    • by Grave ( 8234 )
      Your analogy is a bit off base. More accurate might be to hope that spammers wouldn't abuse the accessibility loopholes in the same vein that criminals don't park in handicap spaces while they're inside robbing the store. Oh wait, they probably do.
  • Paying 3rd-world human beings usually gets past captchas.

    A partial solution is to limit the services you offer based on how well you know them. Anonymous? Offer very limited services.
    Anonymous but tied to an existing email address? Offer a bit more.
    Authenticated by credit card, which could be stolen? Offer a bit more.
    Authenticated by PO box? Offer more.
    Authenticated by street address, driver's license number, and a notary? Assume they are legit, you can always sue the notary if they aren't.

    • "Authenticated by street address, driver's license number, and a notary? Assume they are legit, you can always sue the notary if they aren't."

      Just another database to be stolen and used to create credit hell for those people listed in the database.

      No thank you.

      The only solution asshattery is pain. No, not virtual pain, REAL Ass Kicking Pain.
  • by Anonymous Coward
    Captcha (and Recaptcha) were used as tools since machines were not smart enough to crack distorted charecters. The fact that they are able to do so now is great news! Now these techniques can be used in improving existing image recognition tools... provided there's a way to obtain access to the spammers toolbox.

    Am looking forward to the first TRUE bot to post comments here...
  • Spammers need to be shot.

    The only reason to have these things is to try to limit spambots. Imagine if instead of spending Millions of dollars developing and maintaining anti spam technology, we used the money to assassinate Spammers, and the producers of the crap they sell, the problem would immediately disappear.

    You know, I'm almost serious. Why is it that we tolerate Asshats in this world. This is the result of the namby pamby wimpy peaceniks that think when an asshat gets his lights punched out, that the
    • by Grave ( 8234 )
      "We're dicks! We're reckless, arrogant, stupid dicks. And the Film Actors Guild are pussies. And Kim Jong Il is an asshole. Pussies don't like dicks, because pussies get fucked by dicks. But dicks also fuck assholes: assholes that just want to shit on everything. Pussies may think they can deal with assholes their way. But the only thing that can fuck an asshole is a dick, with some balls. The problem with dicks is: they fuck too much or fuck when it isn't appropriate - and it takes a pussy to show them tha
    • by rthille ( 8526 )
      Ha, we're getting the spammers to fund AI research...the more we make captcha's like Touring tests, the more they'll do AI research in their attempts to break it.

  • I'm convinced that the next major breakthrough in artificial intelligence will come from spammers trying to develop more and more sophisticated programs to foil captchas. Eventually they will become so sophisticated that the true test of whether you are human is if you fail miserably at trying to figure out what the hell the captcha is, but the bots will get it instantly. I for one, welcome our new captcha-killing overlords.
  • There was a captcha a while ago that pulled pictures and "hottness" information from, then asked the user to select three of the 9 people that were "hott". link []

    While this approach probably wouldn't be very appropriate for "serious" companies to use (think IBM, microsoft, usbank, etc.) as protection from bots, I feel like it is a step in the right direction. There are things that humans are really good at and captcha builders need to start using them. For instance: show somebody 5 pictures of
    • by spazdor ( 902907 )
      The problem is that all these options require photographs, which mean each new CAPTCHA requires some human-work to produce. If we're going to prevent spammers from just exhaustively cataloging the right answers, we need an automatable, procedural way to generate new ones.
      • by blhack ( 921171 )
        And that is exactly where the problem is. Anything that has been CREATED by a computer can be reverse engineered by a computer. I know that there were some really HUGE databases created a few years ago that were trying to create artificial intelligence (one of them was called CYC, another was called GAC, there is a wired article about them here []) the idea was that people would answer hundreds of thousands of questions like "are purples round?" or similarly silly questions. The hope was that we could progr
  • I think the capcha thing is about over. One alternative is identifying new users by texting a password to their cell phone. One account per cell phone number. This limits access to people with computers but not cell phones, but that's not much of an issue at this point. GMail used to do this.

    Yes, you can buy vast numbers of SIM cards, but they're not free.

    The main problem with this approach is that sending SMS messages is not free. Bulk services charge around US$0.05 to US$0.11 per message. However

    • One alternative is identifying new users by texting a password to their cell phone.

      Will Verizon's landline division install an SMS to landline gateway [] so that my phone can receive SMS? If so, when?

      One account per cell phone number.

      How do I set up an account on a number that used to belong to somebody else who canceled her mobile phone service, allowing the network operator to reassign the number to my phone?

      This limits access to people with computers but not cell phones, but that's not much of an issue at this point.

      Citation needed.

      • by Mike89 ( 1006497 )

        Citation needed.
        'The U.S. currently has a mobile phone penetration rate of 81%' (Source []). Some other places have more than 100%
  • Eventually, the free service providers (free net mail in particular) will become predominantly the domain of spamsters. When that happens (and it will), admins like me will start blackholing them; then, end-users will be forced to abandon them. Finally, they'll be obliged to start doing something heinous, like requiring a paper form submitted via snail-mail before a new account can be set up.

    The dim bulbs in our government will love this, because it'll provide the "accountability" they've been craving to

  • I've wanted to gripe about this for ages, but here it finally seems on-topic:

    Slashdot's audio CAPTCHA is a joke.

    The computer voice SPELLS the word for you letter-by-letter. A bot wouldn't even have to use heuristics-based speech recognition, just searching for 26 waves (or FFT signatures) would do the trick.
  • The fundamental problem with captcha's is that they are using computers to come up with problems for humans. If a computer can come up with the problem, a computer can come up with the solution.

    Captcha's so far are relying on a human strengths at visual perception, edge finding, pattern recognition, etc to retrieve distorted data. But these are simply processing issues. And computers will eventually solve them all.

    The proposals for 'better captchas' revolve around the idea of having more complex problems of
    • by jfengel ( 409917 )
      I've been thinking about something like this for a while. I think about it in terms of OpenID, where you get to define the terms of authentication by running your own server.

      Service providers like GMail can turn that around and say, "OK, but we're only going to accept authentication from certain providers, who have confirmed to us one way or another that they reliably identify you as a human."

      OpenID separates authentication from the services, so you don't have a single database to be compromised. The most
    • I'd like to add a big "+1 / Me Too" to the parent post. The reputation-anonymiser idea is very interesting.

      There are though some problems with reputation systems (as seen on, e.g., wikipedia): sock and meat puppeting. These problems are to some extent a function of the size of the domain of a reputation system - the smaller it is, the easier to game and vice, versa.

  • Ok.. so Audio CAPTCHAs have been broken. Visual ones have been broken... Why not either Mix the two? or require some actual LOGIC to answer it? Maybe a picture of a cat. then 4 radio buttons asking what this is a picture of. If you are unable to tell what a CAT is in the picture, then you shouldn't be on the internet anyway.

    Or maybe a multi-visual CAPTCHA. 2 Captchas. 2 Text boxes. Captcha 1, goes to text box 2, or can even be swapped.

    CAPTCHA one says "Enter 12345 in box 2"
    CAPTCHA one says "Ent
  • All I can say is, I'm glad most spammers aren't hearing impaired or else this might really turn into a problem.
  • Digital world is the world of non-humans and humans are aliens in it. The robots are naturals and they do all that interaction with this world much easier and more effectively.

    Currently the dark underinternet world of spambots, worms, viruses, malware, etc. does not have limits in the arms race, while the world of positive use of internet does have them. There is no digital robotic police that have power to enter our private digital domains and check for suspicious activity. There are no government sponsore
  • that this "arms race" of escalating sophistication of captchas and equally sophisticated cracks is actually a form of the Turing test but one conducted with the ethics of a street brawl.

    We do occasionally find the question "Are you human?" posed in proximity to the captcha.
    • I wonder whether spammers trying to crack captchas are accelerating AI research, or just misusing it?
  • Okay.. how about a question...
    And a picture.

    How many parrots are in this picture? (audio).
    Picture of 1-7 parrots mixed with other birds.

    How many miles over the speedlimit is this car going? (audio)
    Picture of a car speedometer at 35 to 95 with a Speed sign through window of 35 to 95 mph.

    What letter is missing from the second word? (audio)
    Habit (picture)

    The audio could be a separate text box instead of audio.

    Generate a million simple but unique questions that require thought and each one has multiple po
    • Okay.. how about a question...
      And a picture.
      How would people who are blind or hard of sight authenticate themselves under such a system?
  • I keep hearing how XYZ's captcha got broken, and the method is used by malicious entities to do A,B and C. why hasn't someone made a Firefox plugin to do these for end users? if the bots dont have to mess around with the annoying distorted images or listening to a soundbite and working out what it says, why do humans still have to?
  • Add garbage to the audio like they do to the graphics. Only a human will be able to pick up the "subtle" differences in phonics :)

You know, Callahan's is a peaceable bar, but if you ask that dog what his favorite formatter is, and he says "roff! roff!", well, I'll just have to...