Forgot your password?
typodupeerror
Security AI The Internet IT Technology

Researchers Break Video CAPTCHAs 109

Posted by Soulskill
from the soon-you-will-need-to-authenticate-in-person dept.
Orome1 writes "After creating the 'Decaptcha' software to solve audio CAPTCHAs, Stanford University's researchers modified it and turned it against text and, quite recently, video CAPTCHAs with considerable success. Video CAPTCHAs have been touted by their developer, NuCaptcha, as the best and most secure method of spotting bots trying to pass themselves off as human users. Unfortunately for the company, researchers have managed to prove that over 90 percent of the company's video CAPTCHAs can be decoded by using their Decaptcha software in conjunction with optical flow algorithms created by researchers in the computer vision field of study."
This discussion has been archived. No new comments can be posted.

Researchers Break Video CAPTCHAs

Comments Filter:
  • by mapkinase (958129) on Monday February 20, 2012 @01:34PM (#39101521) Homepage Journal

    Commies vs West
    MPAA vs sharers
    coders vs decoders (that includes captcha vs decaptcha)

    It's fun to observe it when government does not interfere.

  • by Elgonn (921934) on Monday February 20, 2012 @01:34PM (#39101525)
    We need some made up law.

    "Anything a computer can generate it can understand."
    This is why chat bots still suck. Computers cannot generate context.
    • by andsens (1658865)

      "Anything a computer can generate it can understand."
      Well that's besides the point, isn't it? A computer can generate and understand hashes, but that does not mean they are easily breakable

      You just need to make the decoding much harder than the encoding. There must still be computational areas in the visual domain where we humans are way more efficient.

      • There must still be computational areas in the visual domain where we humans are way more efficient.

        Even if that is the case, there is still a relatively straightforward attack on captchas: the mafia porn site. It is generally easier to use a mechanical turk to decode captchas than to attack captchas algorithmically.

      • by Anonymous Coward

        I've always thought that going with a higher level thinking would be harder to break. Instead of copying letters from an image you have to identify a set of images that is easy for a person but more difficult for a computer. Think children's picture book type deal. Can a computer reliably tell a dog from a cat from a cow?

        • by GIL_Dude (850471)

          I've always thought that going with a higher level thinking would be harder to break. Instead of copying letters from an image you have to identify a set of images that is easy for a person but more difficult for a computer. Think children's picture book type deal. Can a computer reliably tell a dog from a cat from a cow?

          I think that's a pretty good thought. I'd extend it with perhaps one of those, "which of these things doesn't belong" type of setups (which may have been what you meant). It could then show pictures of a banana, an apple, an orange, some grapes, and a baseball hat. I don't know, perhaps there is a way to solve these easily by computer. But I know the stupid text CAPTCHAs that I had to go through yesterday to sign up for one site were so "obfuscated" that I couldn't read them either and I had to click the

          • I know I've seen this idea before. I wonder why I've never actually seen it implemented anywhere. It seems pretty easy to do to. Collect images (either drawings or pictures), and assign tags. For example an apple might have the tags 'apple', 'fruit', 'food', and 'red'. Then when the system generates a captcha, it picks a random tag in its database, and finds 4 images with that tag, and 1 without. The user should be able to pick out which images isn't a 'fruit' or 'red'.

            Users could even be used for ass

        • by mcavic (2007672)
          Nah, it's just as easy for a machine to recognize an animal as it is to recognize a character. And we're getting to the point where any question that has an objective answer can be answered by a search engine.
        • by Desler (1608317)

          Maybe not but Apu sure can and he'll do it for peanuts.

      • There must still be computational areas in the visual domain where we humans are way more efficient.

        On your left, you will see 21st century purely organic brains. Their limited capacity neural networks had not yet been mechano-electrically enhanced with additional storage, high speed neuronal interconnects, broad EM spectrum sight, or even simple wireless intercourse, or "telepathy" as the luddites of the past initially called it.

        On your right, you will see the first machine intelligence construct which exceeded human levels of complexity. Not to worry, the intelligence that once inhabited this form

      • You could have panels of comic strips that combine visual information to provide a context, and partially filled word bubble with one very obvious missing word.

        By combining visual cues to provide context for missing word, you could at least make it harder for algorithms, although Underpaid Indian People attack still works.

      • You just need to make the decoding much harder than the encoding. There must still be computational areas in the visual domain where we humans are way more efficient.

        You fall into the same tired old trap of believing this is some kind of arms race: a game of escalation. It's not. It's a matter of finding the things that computers are not very good at; which is usually about context, and more specifically, culture. In other words, it's not the visual domain, but cultural markers where computers are simply unable to compete with humans. The danger then, is alienation. You have to target your audience carefully, and in a localized manner.

    • by arth1 (260657)

      Computers cannot generate context.

      They're getting better at it.
      http://www.wolframalpha.com/ [wolframalpha.com]

    • "Anything a computer can generate it can understand."

      Thus explaining the prevalence of these:

      https://en.wikipedia.org/wiki/Ciphertext_only_attack [wikipedia.org]

      The problem is not creating things which are hard for computers to decode, it is creating things which are hard for computers to decode but easy for humans. That is why captchas will ultimately fail: they rely on the idea that there is something that human brains can understand which computers cannot decode, but which computers can still generate.

      • As soon as computers are as capable as people, captchas are no longer necessary. Then computers can directly detect and block unwanted behaviour. With the added advantage that it can block that behaviour even if real humans do it.

        • Except that the ability to solve a simple puzzle may be different from the ability to recognize spam (which is what we are really trying to stop here). Even if you had a computer that was better at solving CAPTCHAs than humans are, you might still be unable to detect the specific class of unwanted behavior that you were trying to defend against. Now, if the CAPTCHA was asking you to label a series of short messages "spam" or "not spam," then perhaps your point would hold...except that it would be far too
  • ..if your user can interact with it, they can screw with it. The nature of HTTP and the web is a stateless environment, one has to impress state onto it for things like secure transactions and sessions. Basically, you need to come up with a test that randomly checks to see if the input is coming from a person; all without breaking the experience of the web browser, or the web in general. It's an arms race, and things are even again; another advantage bites the dust.

  • Why bother (Score:5, Insightful)

    by onyxruby (118189) <onyxruby&comcast,net> on Monday February 20, 2012 @01:36PM (#39101549)

    The catchpa is worthless against an army of Indians being paid just pennies a pop to break them. The only thing they do is annoy the script kiddies. Far better success would be had in doing pattern recognition on sign ups instead.

    • Re:Why bother (Score:5, Insightful)

      by 0123456 (636235) on Monday February 20, 2012 @01:48PM (#39101625)

      The catchpa is worthless against an army of Indians being paid just pennies a pop to break them. The only thing they do is annoy the script kiddies.

      No, They also annoy your actual, real human users. I often have to try three or four times to get the bloody thing right.

      • Words cannot express the rage I felt when I needed to register an XBox Live account to play a game I purchased because of the stupid G4WL DRM nonsense. I spent around 10 minutes on the bloody captcha because it differentiated capital, lowercase, number, and symbols. It was the most absurd captcha system I've seen to date. Was it an O, and 0? lowercase L or uppercase I? Was that a dollar sign or just some lines thrown in to distort the word further? An M or a W flipped on its side (was a 90 degree squiggle t

        • by iiiears (987462)

          Secure sign-in with google or Facebook for a single player game and now we are tracked everywhere with all of our personal info attached.

          .

    • The going rate is around $1 per 1000 solved catchpa.

  • Good... (Score:4, Funny)

    by AngryDeuce (2205124) on Monday February 20, 2012 @01:36PM (#39101551)
    Honestly, I fucking hate CAPTCHA and will cheer on its demise. Good luck typing this shit in... [blogspot.com]
    • dompoli sprain?

      That doesn't seem impossible, especially since only the first one will matter.

  • We have to face that fact, capcha is just a temporary measure anyway. Software is rapidly approaching the ability to do anything online that your average human can. While computers rapidly increase in capability, the average human stays the same. Eventually the only way to tell a computer from a human, will be the humans are easier to confuse.
    • by jamstar7 (694492)
      And yet there are still those Craigslist 'employment' ads that promise 400/week for 5-10 hours 'work' spamming newsgroups and such. If it were automated, those 'jobs' would be lost. No big deal, really, cause when it's all said and done, it works out to about 25 cents an hour.

      Multilevel marketting scam, anyone?
  • What could possibly go wrong? v1agra
  • by Compaqt (1758360) on Monday February 20, 2012 @01:48PM (#39101623) Homepage

    have anything else to do?

    Sorry, had to say it.

    • by timeOday (582209)
      It doesn't matter too much which problem researchers focus on - they are solving the problem of human (and then superhuman) capabilities in this area. Captcha's are nice because you have a self-funding opponent creating test data for you.
  • by stms (1132653) on Monday February 20, 2012 @01:49PM (#39101635)

    http://xkcd.com/810/ [xkcd.com]
    At least something good could come out of captchas.

  • If you have a small-ish site that caters to a niche community where your target audience will share some knowledge that non-target folks don't have, a riddler where you can set the questions can work great. Just structure your questions in such a way that the answer is non-obvious in an automated way to all but the best AI engines.

    For example, Phoronix could use a question like this --

    Which of these is superfluous? Intel, ATI, NVIDIA, AMD

    • by arth1 (260657)

      For example, Phoronix could use a question like this --

      Which of these is superfluous? Intel, ATI, NVIDIA, AMD

      And even that isn't as clear-cut as you might think. Most people probably think that ATI is superfluous, but if so, they're wrong.
      If you say "ATI, nVidia and Intel", you don't need to mention AMD cause it's impled, thus AMD is superfluous.

      If you make a question unambiguous enough, computers can answer it too. You can overwhelm a computer system by the sheer amount of ways to ask things, but then you need a human, who in the long run can't produce captchas as quickly as a computer can fail them.

      • by div_2n (525075)

        It's just an illustration, but just like it can be hard for humans to decipher a captcha, it could be hard to understand the logic -- Intel, AMD and NVIDIA are all companies where ATI was actually purchased by AMD and would thus make it superfluous.

        If it were easy to answer, it would be easy for automation to crack it.

        • You need a human to generate these types of questions. That limits the number of them you can cheaply create.

          then a spammer gets a human solve them each once, record the answers and play them back as needed.

          You might as well pay for a live operator to verify each person. Of course if you set that up, computers will pretend to be hearing impared and demand access through TTY interfaces.

          Turring test anyone?

  • by Animats (122034) on Monday February 20, 2012 @01:54PM (#39101675) Homepage

    The CAPTCHA industry is not doing well.

    ReCAPTCHA needs to be retired. OCR is getting too good. ReCAPTCHA, remember, is using images from book scanning, ones that the OCR system couldn't recognize. When ReCAPTCHA started, the text presented was usually an English word. Now, if the book scanning OCR system can't figure out something, it's probably not an English word. You're lucky if it's a sequence of characters found on an A-Z keyboard. People have reported ink blots, mathematical formulas, and Cyrillic.

    Worse, ReCAPTCHA's idea of the "right" answer is crowdsourced. It's possible for bots to pollute the ReCAPTCHA database, by providing the same wrong answer more than once. You only have to get one of the words right, so if you can read one, a junk response for the other works. This goes into the database as a vote for the "right answer", to be presented to someone else later. I sometimes type "whatever" when one of the images is unreadable.

    • by Anonymous Coward

      I sometimes type "whatever" when one of the images is unreadable.

      You're missing an opportunity to add words to past texts. I always type "bunga-bunga". My hope is that someday in the far future, a scholar of historic literature will be scratching his head wondering why all these old books have the phrase bunga-bunga thrown in at random places.

      • by Sulphur (1548251)

        I sometimes type "whatever" when one of the images is unreadable.

        You're missing an opportunity to add words to past texts. I always type "bunga-bunga". My hope is that someday in the far future, a scholar of historic literature will be scratching his head wondering why all these old books have the phrase bunga-bunga thrown in at random places.

        To screw things up?

    • by RyoShin (610051)

      Another reason I recently realized that recaptchas are useless: The whole idea is that one of the words could be read by a robot [spoiler]from the start[/spoiler] to be included in the rotation. Now, granted, they've modified the word to try and anti-robot it, but the fact remains that at some point it was readable; the other "word" never was. Thus it had a limited lifespan until the spambots caught up in OCR to Google's bots.

    • ReCAPTCHA needs to be retired.

      Perhaps.

      OCR is getting too good.

      I've spoken with the founder of ReCAPTCHA about this when he came to campus for a talk several years ago. It's both the expected end game and seen as a victory ("we forced OCR to become usable with market pressures").

      Don't worry, they have other puzzles in the queue that need machine comprehension models.

    • by rdnetto (955205)

      Worse, ReCAPTCHA's idea of the "right" answer is crowdsourced. It's possible for bots to pollute the ReCAPTCHA database, by providing the same wrong answer more than once. You only have to get one of the words right, so if you can read one, a junk response for the other works. This goes into the database as a vote for the "right answer", to be presented to someone else later. I sometimes type "whatever" when one of the images is unreadable.

      Not just bots - humans can (unintentionally) do it as well. Sparkfun (an electronics hobbyist site) recently had a giveaway in order to stress test their servers. Several thousand people were solving CAPTCHAs as quickly as possible. There was a noticeable drop in the accuracy of the answers required, since a lot of people were taking shortcuts in entering them.

  • Well, the whole CAPTCHA system is itself flawed - it's putting all the data in one place. The only way to make it harder would be to have multiple data sources for users to have to put information through - e.g. not simply one CAPTCHA to verify, but 3 or 4 separately loaded, and all indepent of each other. (Even 2 would be an improvement.)

    Still, it would only be a matter of time before the bots figured out how to track all the CAPTCHAs and thereby defeat it yet again.
  • I NEED one of these captcha solver programs. When I try to register for a website or forum, many of them are so unreadable it takes me 20 minutes of trying to get it right and NO PHONE NUMBER to call their technical to register me by tele.

  • by epdp14 (1318641) * on Monday February 20, 2012 @02:20PM (#39101931) Homepage
    What about charging 10-15 seconds of CPU time with some arbitrarily hard code? It seems like everyone agrees that CAPTCHAs are an arms race that the good guys can't win, why not make it where it isn't profitable to solve the CAPTCHA replacement on a large scale?
    • This sounds an awful lot like this antispam attempt:

      https://en.wikipedia.org/wiki/Hashcash [wikipedia.org]

      So far this has not been widely successful, although perhaps it is because it targets the email system rather than the web (where things tend to change faster).
    • What about charging 10-15 seconds of CPU time with some arbitrarily hard code?

      A major obstacle to this is that you have to make the puzzle easy enough that your users on lower-end or mobile devices still have the necessary computation power to complete the puzzle in a reasonable time. Malicious organizations behind the spam will just put more hardware into their attack, typically by using the compromised machines in botnets. They'll also optimize the code, and parallelize the attack by performing the computation for multiple attempts on multiple CPU cores, while your code has to wo

      • They'll also optimize the code, and parallelize the attack by performing the computation for multiple attempts on multiple CPU cores

        Then perhaps you should base the challenge on something from this class of problems:

        https://en.wikipedia.org/wiki/P-complete [wikipedia.org]

        Let's now imagine a perfect world in which you create a check that actually takes 15 seconds to complete. They can still do that 5,760 times per day.

        The point of this proposal is not to stop spam entirely, but to keep the rate at which spam can be sent down to manageable levels. If a spammer can only send 5760 spam messages per day, that is a big improvement -- right now spammers are limited only by bandwidth, and can send tens of thousands of messages per day.

  • by Anonymous Coward
    The state of OCR has changed little in over a decade, at least at the consumer end. I've tried the top software like Acrobat Pro and Omnipage and hardware solutions from Xerox, HP, Fujitsu, etc. The text can be printed clear as day yet, with no flaws, and the OCR programs all fail to get above I'd say a 70% accuracy. Maybe it's different in the commercial world, where one can afford a $25,000 glorified copier, but I've been unable to find anything you can buy from Amazon or the like that will reliably scan
  • There are a variety of low-tech techniques that can be more effective than using Captchas or even "security questions", especially when you mix and match. You don't have to annoy your legitimate users, or make them jump through hoops. One trick is to include a "honeypot input" in your form. Give it a tantalizing name attribute such as "username", give it visibility of "hidden" (with CSS from a style-sheet), and when validating your form simply check to see if any values have been entered. If it's non-empty,
  • by Colonel Korn (1258968) on Monday February 20, 2012 @02:55PM (#39102419)

    The key with CAPTCHAs is diversification, just like the key to avoiding disease in biological specimens is avoiding a monoculture. If there were 15000 different CAPTCHA methods, it wouldn't be profitable to create CAPTCHA tools that would each only work on some small subset. There are a lot of low population sites I use that check whether I'm a human with some unique set of hoops through which I must jump. The effectiveness of those hoops comes from the fact that they're often unique to that site, not a lump of code used by thousands of different sites. Diverse CAPTCHA breaking might require something like Watson, which isn't going to be available to spammy types in the near future.

  • by Phrogman (80473) on Monday February 20, 2012 @02:58PM (#39102447) Homepage

    Have the captcha page displays some really good porn video footage - drawn from a huge repository of suitable images (say, the rest of the internet). The clips are fairly long (say 3-5 mins or so). To pass the captcha the user merely has to click on a button at the right time.
    So, if the user clicks right away, its a bot. if there is a suitable pause (say 3-5 mins), then its more likely human :P

  • I have to wonder just who Standford is trying to help out with this research. Captcha's may be annoying but when their research makes its way to the script kiddies and the industry comes up with a new solution does anyone really think the new solution won't be even more annoying?
  • finding a captcha is on the verge of proving that you ARE indeed a robot...

  • I always thought that was pretty secure because the machine couldn't tell which picture was a cat? What about combining video and cat captcha. 4 videos, one of which is a cat. But it could be a close video, or a zoomed out one where the cat is running around. A computer really shouldn't be able to decode that. Use a large enough database and they'll never solve it.

    • You'd need a lot of pictures and videos of cats. Good luck finding that on the internet!

    • by wvmarle (1070040)

      if I have to start watching videos just to sign up for some forum or so, then the sign up is probably just not going to happen. Your idea sounds several orders of magnitude more annoying than the already highly annoying captchas in use (with ReCAPTCHA on the top of annoyances - most of them are simply unreadable).

      • by crossmr (957846)

        Catch captcha has been around for awhile, but it doesn't even have to be a video. It could be 4 animated gifs. A computer would have a hard time deciphering an animated gif of a cat running across a room. But for a human they're very easy. Easier than recaptcha.

  • I got to chat with Luis von Ahn, co-creator of the Captcha and reCaptcha, and it turns out he's a surprisingly idealistic guy. Taking inspiration from people in gyms pedaling and going nowhere, he hoped to actually *do* something with the brainpower needed to solve a reCaptcha (he said something along the lines of, "actually your brain is doing a pretty amazing thing -- translating an image to text.") Maybe digitizing the archives of the New York Times and ancient manuscripts isn't world hunger or world pea
  • I wonder why most people cannot spell Stanford University's name correctly.
  • Hopefully they'll start integrating these deCaptcha tools into Firefox and Chrome. Captchas became so hard it's impossible for mere humans to solve them.
  • Made to look like a captcha, with the text, "What is 2+3?" Spammers read the captcha and submit that back. Normal people type 5. "What site is this?" is another good one. Heck, you don't even need to make it look like a captcha; it's just funnier that way. One site I mod used to get dozens of spam threads a day, until a couple years ago they added a box to the end of their registration with one of a handful of questions like "what do seal clubbers club" (answer: seals), or "what is the first letter of the a

Those who do not understand Unix are condemned to reinvent it, poorly. - Henry Spencer, University of Toronto Unix hack

Working...