Forgot your password?
typodupeerror
Google Security

How Hackers Listened Their Way Around Google's Recaptcha 101

Posted by timothy
from the listen-to-what-the-flower-children-scream dept.
An anonymous reader writes with this story at Ars Technica: "Three self-taught hackers from the DC949 hacker collective managed to use a combination of techniques to beat ReCaptcha with 99.1% accuracy (better than most humans!)" In short, the hackers skipped the visual part of the Recaptcha system entirely, focusing on the audio alternative, which gave them a few convenient angles of attack. Google responded with changes to the system, but that doesn't minimize their accomplishment.
This discussion has been archived. No new comments can be posted.

How Hackers Listened Their Way Around Google's Recaptcha

Comments Filter:
  • First! (Score:1, Funny)

    by Anonymous Coward

    Oh yeah! Not even a recaptcha to worry about!

    • Google updated a few hours before these guys revealed their accomplishment. TFA mentions that other groups had found less effective ways of circumventing the audio portion. Is there any indication that this was about to be a problem? How likely is it that anyone wanting to actually abuse it was about to figure this out themselves? Seems to me like there are so many suckers out there, that spammers don't need to spend too much time with things like this.
      • by icebike (68054) * on Thursday May 31, 2012 @06:50PM (#40174503)

        Quote summary:

        Google responded with changes to the system, but that doesn't minimize their accomplishment.

        On the contrary, yet is does minimize their accomplishment. It makes it all for nothing, a technical exercise, with no near term or long term payback.
        Recaptcha is a huge con, no more secure then the original captcha. The second (or first) portion being there only to serve some other purpose, and any answer will do.

        Adding the audio option (probably forced by ADA) did nothing for security. At best this demonstrates that adding multiple different keys to the same lock makes things worse, not better.

        Captcha's original intent was to slow down bots, by making the user prove they were human. They are seldom used to protect anything
        of value, simply to keep the nuisance bots to a dull roar.

        Now it appears that machines can beat captcha and recaptcha very easily. So WHY do we still see these schemes in use?

        • by Baloroth (2370816) on Thursday May 31, 2012 @07:59PM (#40175157)

          Because even a very "high" accuracy machine system is still going to add a significant barrier to automatically cracking the results, especially if Google continues altering reCAPTCHA like they do. While you won't eliminate 100% of attackers, you can eliminate the vast majority, and slow down the attackers that do get through. The alternative is to use nothing, and believe me: you absolutely do not want that. The Internet would be 99.99999999% spam almost overnight if that happened.

        • by Main Gauche (881147) on Thursday May 31, 2012 @08:31PM (#40175419)

          Now it appears that machines can beat captcha and recaptcha very easily. So WHY do we still see these schemes in use?

          Could you give me your address, and let me know when you won't be home? (I presume you no longer lock your house.)

          • by residieu (577863)
            If the only locks available had keys that didn't fit properly and took multiple attempts to open, while not stopping any real thieves, I'd consider it.
        • by bill_mcgonigle (4333) * on Thursday May 31, 2012 @10:48PM (#40176209) Homepage Journal

          On the contrary, yet is does minimize their accomplishment. It makes it all for nothing, a technical exercise, with no near term or long term payback. Recaptcha is a huge con, no more secure then the original captcha. The second (or first) portion being there only to serve some other purpose, and any answer will do.

          It's funny that you'd complain about a waste of effort and then bemoan Recaptcha, which was developed to prevent all those man-years of solving CAPTCHA's from going to waste.

          BTW, the founder of Recaptcha has expressed that he will be happy when it can be defeated trivially because at that point the other job it's trying to do can be completely automated, which is still a win.

          • by Anonymous Coward

            Not if the trivial defeat simply consists of solving the "easy" word and filling in junk for the hard one. Which is what a fair number of humans do.

          • by g0bshiTe (596213)
            Makes me wonder if the founder knows there's an easy way to beat it.
        • by g0bshiTe (596213)
          So when does your arguments for this minimizing their accomplishment come in?

          Why do you assume that it comes just as Google makes changes to the system? Are you positive the change to the system did not stem from them reporting this to Google, and then following safe disclosure practices gave Google time to fix it, before going public. Are you sure they didn't do all this, then report it to Google and collect a "reward" for what they found?
    • by Anonymous Coward

      Moderation
          50% redundant
          50% funny

      I wonder how the first post can be redundant.

      • by bkaul01 (619795)
        When idiots spam every thread with worthless "First!" posts, how could any one of these posts not be redundant?
  • by whitesea (1811570) on Thursday May 31, 2012 @05:00PM (#40173375)
    They wisely chose the weakest link to attack.
  • Singularity (Score:4, Insightful)

    by MrEricSir (398214) on Thursday May 31, 2012 @05:05PM (#40173429) Homepage

    Since they beat the Turing Test, this means we've reached the AI singularity... right?

    • "More human than human." It just means the Tyrell Corporation was working on it.

    • by Quillem (2641391)
      Quoting the coda of the story:

      While the changes stymied the Stiltwalker attack, Adam said his own experience using the new audio tests leaves him unconvinced that they are a true improvement over the old system.

      "I could only get about one of three right," he said. "Their Turing test isn't all that effective if it thinks I'm a robot."

      :)

      • Re:Singularity (Score:4, Interesting)

        by mcgrew (92797) * on Friday June 01, 2012 @08:35AM (#40179183) Homepage Journal

        You bring to mind something I read long ago, too long ago for a citation. A researcher was running a turing test with one subject seeing if he could decide which terminal was a computer and which had a computer on the other end.

        The tester just sat there without inputting anything. Pretty soon a message came up on one screen: "Is there anybody there?"

        "That's the human," the tester said

  • Snake meet tail (Score:5, Insightful)

    by V-similitude (2186590) on Thursday May 31, 2012 @05:07PM (#40173449)

    I realized there's an interesting aspect to this, in that gVoice transcription is actively trying to do basically the same thing these guys did* (albeit in a far more general way). Wonder how gVoice would do transcribing google's own recaptcha audio. Someone go try that. Either way though, it's an interesting dilemma if they ever got automatic transcription good enough to defeat these audio recaptchas.

    * Well, after RTFA, I realize that a fair bit of what they did was actually more related to hashing (and the pseudo-random generator) vs actually trying to parse the audio, but still.

    • by SomePgmr (2021234)

      Having seen lots of google voice transcriptions, I'm pretty sure it couldn't transcribe it's way through the most articulate of all audio captchas. Years of training and it's only gotten worse.

      • I don't know, it's nearly perfect on phone numbers, in my experience (which is really helpful). And pretty useful on most stuff to get a good enough idea. Though it does stumble a lot. But yeah, prob doesn't do very well with these, was just a thought.
        • by Nonesuch (90847)
          The Google Voice transcription is so uncannily near-perfect with phone numbers, and so awful with everything else, I suspect it is cheating, and using the Caller-ID and other sources to cheat on 'recognizing' a phone number.
          • by Aranykai (1053846)

            I use it on android to send about 200 texts a month. Once you learn to speak naturally instead of over-enunciation everything, it does quite well. I suspect a big part of the issues with voicemail transcriptions is partly to do with audio compression on cell phones.

          • I think they just put extra emphasis on numbers, since they're limited in scope (only 10ish words, and relatively simple context) and more critical than other words in a VM transcription. I just checked a few VMs and it's perfect on phone numbers even when they're not the same as the caller ID.
    • by Anonymous Coward

      I watched the video (hilarious, btw). Someone in the audience asked if they had tried Google's own speech recognition. They had, and it couldn't solve the audio captcha.

    • I did that three years ago. All my posts are by bots.

      2

      3

      5

    • by ep32g79 (538056)

      Wonder how gVoice would do transcribing google's own recaptcha audio. Someone go try that. Either way though, it's an interesting dilemma if they ever got automatic transcription good enough to defeat these audio recaptchas.

      * Well, after RTFA, I realize that a fair bit of what they did was actually more related to hashing (and the pseudo-random generator) vs actually trying to parse the audio, but still.

      In the presentation they did that question was raised and they stated that using gvoice was the first thing they did with no luck.

  • Another solution.. (Score:5, Informative)

    by Ziekheid (1427027) on Thursday May 31, 2012 @05:08PM (#40173463)

    Most of the spammers who circumvent captcha's use real people to fill in their captcha's for them. How they do it:
    1) A pay-per-filled-in-captcha site (where members solve captcha's, not really getting paid eventhough they think they will be) OR a high traffic site (false/scam sites, hacked sites, etc)
    2) Mirror the image from the site you want to spam to your own site
    3) A person visits your own site with the mirrored image and solves the captcha
    4) Mirror the answer back to the site you want to spam
    5) ???
    6) Profit! (literally)

    • by Anonymous Coward on Thursday May 31, 2012 @05:40PM (#40173779)

      Reminds me of the story of the guy who would play 8 games of chess simulataneously in an octagon and absolutely guarantee he'd win 50% of the games at least.

      He then proceeded to play the moves of the players opposite each other against each other.

      • by zill (1690130)
        Does this guy take bets and where can I find him?

        55% of professional chess matches end in draws. 45% to the power of 4 is 0.17%.

        If he had claimed "he would lose less than 50% of the games" then he would be correct, but that sounds a lot less impressive.
        • by Anonymous Coward

          Does this guy take bets and where can I find him?

            55% of professional chess matches end in draws. 45% to the power of 4 is 0.17%.

          If he had claimed "he would lose less than 50% of the games" then he would be correct, but that sounds a lot less impressive.

          Sorry, I misspoke. I'm certain the wager was that he would not lose more than half the games, or perhaps that a draw would result in a rematch.

        • by hellop2 (1271166)
          Not really a great statistic you created there. maybe this guy is better than average.

          Also, what you calculated was the probably to not draw in 4 consecutive games, not 4 out of 8. There are the same number of ways to lose 4 out of 8 as there are to win 4 out of 8. Thus, there is a 50-50 chance of winning or losing. Therefore, it doesn't matter if we're talking about 4 out of 8, or just 1 game. Based on your statistic, the probability of winning or losing 4 out of 8 is 45%, not 0.17%.
          • by hellop2 (1271166)
            Think of it another way. Is the probably of flipping a coin heads 4 (or more) out of 8 times 0.5^4 = 0.125? No, it's 50-50.
            • by zill (1690130)
              First of all, it's "probability".

              Seconds of all, there are only 4 chess games going on. I don't know where you got the number "8" from.
              Ostensibly, the con-artist claims "I'm play 8 chess games against 8 players simultaneously."
              What's actually happening is that he's using the moves of A against B, C against D, E against F, and G against H. Thus there are only 4 chess games going on.

              Out of 4 chess games, there are precisely 5 possible outcomes:
              4 winners: 45%^4 * 55%^0 * 4 choose 4 = 4.1% (I accident
              • by hellop2 (1271166)
                Obviously I was talking about the general probability of a (weighted) fair coin toss. (8 of them)

                Actually, you're totally right about the 4 games being all that matters. (assuming he doesn't alter the moves, and just plays them off each other).

                For some reason I automatically assumed that the question was only talking about games that were either won or lost.

                But, let's assume you're right. Then the con-artist only won his bet 4.1% of the time. This is not a very good con if the con-artist loses
      • absolutely guarantee he'd win 50% of the games at least

        "he wouldn't lose at least 50% of the games" would be more accurate (draws)

      • That would work for an opening move but the whole point of chess is that there are many opening moves and with each additional move the possible moves explode until you need a very special sort of mind or a big computer (IBM big, not your pitiful 6 core big) to sort it all out.

        How would your guy make sure the moves of the opposite player have any bearing on the moves on the other board? It would be like playing blackjack by copying what the guy next to you does. SMART, if by some miracle you had the same ca

        • You should read some of your sibling comments (hell, there was a video clearly explaining this). What GP would do is play off each other player. To be more specific, he would play black for 4 games and white for 4 (this is the usual setup for playing multiple games simultaneously, incase you did not know). He would see the move the white player makes, not respond to him. Move on to the next board, make the same move on this board. Observe the response, and remember it, so that he can play it in the previous

    • I've seen malware that takes over your computer with a "enter the captcha" to get your computer back. The captcha taken from whatever pool of websites they want to deal with.
  • by Anonymous Coward
    Every hacking group is now a hacker 'collective'?
  • by Anonymous Coward on Thursday May 31, 2012 @05:28PM (#40173673)

    That's it! Make all users do a SERIES of incredibly hard recaptchas. Those who get too many correct are machines! Brilliant!

    • by Anonymous Coward on Thursday May 31, 2012 @06:03PM (#40174011)

      ...especially if they solve them in less time than the duration of the audio. (Only half kidding: They solved millions of eight second long captchas in a second and a half each and Recaptcha didn't even blink.)

      • ...especially if they solve them in less time than the duration of the audio. (Only half kidding: They solved millions of eight second long captchas in a second and a half each and Recaptcha didn't even blink.)

        or maybe it did blink and that's what tipped off Google to change the system?

  • Gone too far... (Score:4, Interesting)

    by whydavid (2593831) on Thursday May 31, 2012 @05:33PM (#40173707)
    I had one of these the other day that was beyond absurd. The visual was a complete scrambled mess, with nearly every letter seemingly equally likely too be 2 or 3 different letters. The audio was even worse: loud gibberish in the foreground with what sounded like someone whispering the actual text in the background. It wasn't until 2 reloads later that I was lucky enough to get a recaptcha that was only slightly ambiguous, and I was able to get it on the 2nd guess. I was far more annoyed at this than I ever have been at a spambot. I'm not sure this is a step in the right direction. Time to move away from garbled text.
    • by Anonymous Coward

      I apologize that I'm anonymous coward here - too lazy to log in (copb.phoenix) - but there is a better solution.

      Machines are not too good at following natural language, so rather than a capcha, a problem written in natural language would - in theory - work best.

      Something clear enough to a human eye, but not too obvious mechanically. One of the best ones I ever saw was not labelled at all, other than "signincheck" on the form and said "tob0rAtONm@i in the reversed proper English, please?"

      • Even that might not work in the long run. IBM Watson gets better every day. It's good enough already for chatbot and it wasn't even designed to do that. I think watson might be nearing ai complete for natural language. Just give it a couple of years and see what else comes up

      • That won't stop the captcha-mirrors who will grab a captcha, farm it out to idiots logging in for "free" prizes, and feed the idiots' answer back to the captcha. You can make it totally impossible for an AI to figure it out, but they'll still get through this way.
    • by LoneBoco (701026)
      I've found that KeyCAPTCHA [keycaptcha.com] is pretty good. I don't know how simple it would be to crack, but I do know that I haven't had issues with automated spam after switching to it.
    • Just type the one you can recognize (the challenge word is in the same style for a few weeks, and you should be able to spot it immediately), and type anything for the other word. The second word is of no consequence to the CAPTCH and only counts towards the Re.
  • by jkerman (74317) on Thursday May 31, 2012 @06:06PM (#40174049)

    It EXACTLY minimizes their accomplishment. Everyone knew the day that was easily exploited, google would get a little less accessable to the disabled. Everyone knew it was the weakest attack point. (jerks!)

  • They get harder, and these days I'm four for five at best.

    Maybe I'm just a machine dreaming I'm human?

  • by Zorque (894011) on Thursday May 31, 2012 @06:14PM (#40174105)

    Google's captchas are the worst I've ever seen. They're almost always unreadable and need to be refreshed all the time. I like Recaptcha (which isn't what Google uses on their sites despite owning it), they're generally pretty clear and in addition provide a free service to anyone that wants to use it. I have no clue why Google sticks with their awful in-house captchas for Gmail, Youtube, etc.

  • Someone recently brought "AreYouAHuman" and its "PlayThru" security test to my attention.
    http://areyouahuman.com/

    I've been using Recaptcha on a niche website I operate for a couple years now, and people have been increasingly complaining about how hard it's getting. While it's English-only right now, PlayThru is very easy to complete, sorta fun, and best of all it tells you whether you got it right before you submit the form, so there's no hoping or guessing. So after a few quick tests, and users raving abo

    • Ah but click on the "accessible" option and lookie lookie, an mp3 audio file with gibberish and a background voice. "enter the words you hear".

      So this exploit would at least prevent using that option.

      The game concept is pretty good though, they just need to make an accessible version.

    • Funny you should mention areyouhuman.com. It actually relies on recaptcha for accessibility. You would have vulnerable by the attack TFA talks about too.

  • by niftymitch (1625721) on Thursday May 31, 2012 @06:30PM (#40174275)

    I bet Siri could solve it.
    All the voice tools out there could be harnessed to this sad end.

  • ...to use the audio version instead of the text version for those damn things. I bet the audio version doesn't have words that show up with weird non-alphanumeric characters or completely inked-out text in them, like a nontrivial percentage of the recaptchas I see seem to have.

  • by barv (1382797)

    Rather a neat way to make an employment application.

  • by ffflala (793437) on Thursday May 31, 2012 @11:16PM (#40176355)
    Now *that's* impressive. The closest approximation I've heard to the audio captchas I've encountered would be the few recordings I've heard that John Lennon used to give out as gifts: he'd record multiple radios playing different stations.

    I did once get an audio captcha that was almost solvable -- AFAICT, it was a conversation between C'thullu in his native tongue and Tom Waits responding in Aramaic, recorded in a crowded airport terminal that had lots of loudspeaker announcements.
  • reCAPTCHA was also undermined by its use of just 58 unique words

    I'm really surprised the corpus was so small. Would have expected to be on the order of thousands.

  • 100% of press believes them 110%.
  • I've got a great new idea. If you can solve the Captcha, you're obviously not a human and are denied access.
  • I haven't seen an analogue to this idea outside the ColdFusion world, but CFFormProtect [riaforge.org] is an awesome tool for protecting ColdFusion-based sites from spam.

    The basic idea behind CFFormProtect is that spam protection shouldn't involve annoying hurdles that users have to jump over, and should be as invisible as possible to the user. It takes what I would say is a similar approach to SpamAssassin, in that it uses multiple heuristic methods to rank form postings for potential spamminess. I've used it extens
  • Yes, they should be awarded. Not for the whole "made in computer to beat computers" thing, but they actually helped in an unintended way - speech recognition. I see this kind of stuff easily joining Praat and software like that, helping linguists to mess with experimental data.

    Well done, sirs.

Old programmers never die, they just hit account block limit.

Working...