Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Security Technology

Looking To Spammers To Solve Hard AI Problems 271

An anonymous reader writes "With bots getting closer to beating text-based CAPTCHAs for good, New Scientist points out that when they do, OCR technology will at least have advanced. The article goes on to suggest that whatever kind of reverse Turing Test that comes next should be chosen to motivate spammers to solve other pressing AI problems, such as image recognition. Are there any other problems that criminal crowdsourcing could help with?"
This discussion has been archived. No new comments can be posted.

Looking To Spammers To Solve Hard AI Problems

Comments Filter:
  • by plover ( 150551 ) * on Saturday April 18, 2009 @10:40PM (#27632709) Homepage Journal

    Advancing the state of the art in Optical Character Recognition was always intended to be a side-benefit of CAPTCHAs. It looks like that plan came through nicely.

    I have always figured CAPTCHAs would be a stopgap until other methods of authentication could easily be used, such as micro-payments or single signon solutions like OpenID. Unfortunately, those other methods haven't been adopted nearly as fast as the need. Perhaps if CAPTCHAs are declared "dead", site operators will feel more urgency to adopt these solutions.

    If CAPTCHAs do continue, I'd like the next problem to be facial recognition software. I'd love a package that could look at a picture and tag it "Nicholas and Andrea" or "Glen and Helene". Digital camera software everywhere could benefit from this technology. Not sure how you'd bake that into a CAPTCHA, but it's a good problem to solve.

    • Re: (Score:3, Funny)

      by Jurily ( 900488 )

      If CAPTCHAs do continue, I'd like the next problem to be facial recognition software. I'd love a package that could look at a picture and tag it "Nicholas and Andrea" or "Glen and Helene".

      Given the likeliness of Linux being the test platform, this will work for female genitalia first.

      • Re: (Score:2, Funny)

        by Anonymous Coward

        One of the chans has "fapchas"; you have to identify various body parts by typing "tits", "vagoo", "dick", or "ass"

    • Once sufficiently powerful, interlinked networks are online, most any problem you can think of will be solved... probably in a radically more efficient manner than current accomplishments.
    • Re: (Score:3, Informative)

      by bmgoau ( 801508 )

      Picasa is a popular image management program that has supported facial recognition since last year: http://www.techcrunch.com/2008/09/02/picasa-refresh-brings-facial-recognition/ [techcrunch.com]

      I havnt used it, so im not sure how good it is.

    • Except that it didn't. There is no such thing as generic Captcha-busting software. They were broken individually, but different teams.

      Micro-payments have not caught on because the banking industry does not support them, nor should they. You are asking one industry to foot the expenses for something that has nothing to do with them.

      OpenID has not caught on faster for at least one major reason: Single Point of Failure. The very same reason people are exhorted to used different passwords on different sit
    • Re: (Score:3, Interesting)

      I hope the next scheme is easier for people with bad eyes. I often have to call the wife or one of my sons to solve a captcha puzzle. The black/white/grey are bad enough - when they combine colors, I'm freaking LOST!! If I'm home alone, I just give up after a couple failed attempts. Good thing my bank doesn't use this scheme, huh?

      • by Artemis3 ( 85734 )

        Captchas are really a problem. They cause serious accessibility issues, and many i can't solve myself having good sight and a large crystal clear lcd screen.

        In your case i think you should try compiz (Called Desktop Effects in Ubuntu) for aid: There is a plugin which inverts colors, another does variable level smooth zoom which could follow your mouse, etc. Just make sure you have compizconfig-settings-manager to turn on the useful stuff and get rid of the rest.

      • by gd2shoe ( 747932 )
        The good sites have an audio option (usually mp3) so that the visually impaired can hear the words. It does stink that so many sites don't think about accessibility. For most of us it is really easy to forget about.
    • by fuzzyfuzzyfungus ( 1223518 ) on Saturday April 18, 2009 @11:51PM (#27633163) Journal
      But has it?

      Unfortunately, CAPTCHA is radically easier than actual OCR. When cracking a CAPTCHA, achieving a success rate of 5-10% is absolutely fine. Plus, when you submit your answer, you are told whether or not you got it right. With OCR, anything short of high 90's is pretty much useless, and the only feedback available is through manual human intervention, which scales poorly.

      Arguably, the only significant OCR advance has been RECAPTCHA, which is just a clever way of making humans do the hard stuff in a way that actually helps, rather than just using makework problems.

      It is certainly true that CAPTCHA cracking has advanced considerably, that just doesn't apply too neatly to real OCR problems.
    • by Wannabe Code Monkey ( 638617 ) on Sunday April 19, 2009 @01:17AM (#27633727)

      If CAPTCHAs do continue, I'd like the next problem to be facial recognition software. I'd love a package that could look at a picture and tag it "Nicholas and Andrea" or "Glen and Helene". Digital camera software everywhere could benefit from this technology. Not sure how you'd bake that into a CAPTCHA, but it's a good problem to solve.

      How about this: The user is presented with a short message that they have to mark as "Spam" or "Not Spam". If the spammers get really good at solving this problem, they've effectively written themselves out of a job. And if they can't do it, then they can't get new accounts.

      • by gd2shoe ( 747932 ) on Sunday April 19, 2009 @02:34AM (#27634041) Journal

        I like it, but it has issues that may be hard to work out.

        (1) If they only needed to solve one (or any small number), then the spammer's auto system will only need to guess. Present the potential user with 3 of these and they'll get fed up. The spammer's system, on the other hand, will get 11% correct by guessing. That's enough for them to thwart the system.

        (2) It's really easy to get samples of spam. Any user who clicks the spam button has stated that it's not their mail. (Multiple users flagging the same message tell you it's practically certain to be spam) It's not a huge stretch to acquire or assume permission to use the message. Getting legitimate samples (of varieties of email) may be much harder.

      • How about this: The user is presented with a short message that they have to mark as "Spam" or "Not Spam". If the spammers get really good at solving this problem, they've effectively written themselves out of a job. And if they can't do it, then they can't get new accounts.

        That's freakin' genius!With the caveat that spammers will never completely destroy themselves - the situation will reach a balance, but at a lower threshold than today. Spam will certainly decrease dramatically, if your idea is to be implemented.

      • Re: (Score:3, Informative)

        by Anonymous Coward

        Actually this exists: http://spamornot.org/ [spamornot.org]

    • Comment removed based on user account deletion
    • I don't know what the replacement should be, but I'd love to see the end of CAPTCHAs. The last time I tried to sign up a gmail account, it took me four tries just to read the bloody CAPTCHA correctly - if we've got to the point where the spammers can parse the CAPTCHA and a human can't, there's no point to them.

  • SSSHHH!!!! (Score:3, Funny)

    by Anonymous Coward on Saturday April 18, 2009 @10:44PM (#27632727)

    Don't tell them that they're the ones that are actually being used! That spoils all the fun!

  • True AI (Score:5, Funny)

    by not-my-real-name ( 193518 ) on Saturday April 18, 2009 @10:46PM (#27632741) Homepage

    I'll just bet that this is what leads to "true" artificial intelligence (whatever that is). Soon, we'll have completely automated agents trying to convince other completely automated agents to purchase stuff to enhance bits of biology that they don't have.

    • Re:True AI (Score:5, Insightful)

      by KPU ( 118762 ) on Saturday April 18, 2009 @11:32PM (#27633027) Homepage

      This is a reasonably accurate description of the stock market.

      • This is a reasonably accurate description of the stock market.

        Even the penis enhancing part?

        That strikes me as bizarre.

    • by jd ( 1658 )

      Since AIs are, almost by definition, more predictable than humans, it is self-evident that AI customers can be more cost-effectively be tailored for and more easily swayed. Since limited intelligences already handle most financial decisions (virtually no humans actually play the stock markets these days), it is the AIs who have the serious money and therefore are the customers of choice.

      If we go down that sort of a road, with spammers and crackers controlling research and development, humans will cease to b

    • Re: (Score:2, Insightful)

      by zach297 ( 1426339 )
      That brings up a good point. When AI is good enough to get past CAPTCHA it will hopefully be good enough to filter out the spam.
  • a possible idea (Score:4, Insightful)

    by ecalkin ( 468811 ) on Saturday April 18, 2009 @10:50PM (#27632769)

    several years ago 'neural nets' were the big thing and they were thinking that they could make them 'learn' and do useful things.

    i always thought that traffic control would be an interesting application. if a computer could look at video of an intersection (and streets leading to the intersection) and figure out where cars were and weren't, you could make traffic lights a lot less annoying.

    so our CAPTCHA might be a picture/video of cars and a request to count them?

    eric

    • by brusk ( 135896 )

      Is that really that hard a problem? With a combination of road sensors and the ability to distinguish between a known background and other colors, that should be pretty easy. The problem is what to do with that information. The only time traffic lights are annoying in a way that could be easily remedied is a 3 am, when it's a pain to get stuck at a red when there are no cars visible in any direction. But that's not an important case; what really matters is making traffic flow smoothly when volume is heavy.

      • Re:a possible idea (Score:5, Interesting)

        by MichaelSmith ( 789609 ) on Saturday April 18, 2009 @11:16PM (#27632921) Homepage Journal
        I used to work on a traffic signal system in Australia. At one point we hosted an experimental system from (I think) the CSIRO which displayed the speed you would have to travel at get a green at the next intersection. The problem with that was that it gave really bad, but accurate advice, like travel at 12km/h or 80km/h. This is where the limit is 60. So they changed it to only display speeds below and close to the limit and then it was even more useless.

        The actual algorithms which determined the timing of the signals was hand assembled by traffic engineers in 12 bit PDP/11 machine code, so it was impossible to know exactly how it worked.

        Maybe that system was intelligent. It certainly had a lot of emergent properties.
        • The problem with that was that it gave really bad, but accurate advice, like travel at 12km/h or 80km/h. This is where the limit is 60. So they changed it to only display speeds below and close to the limit and then it was even more useless.

          What I don't understand is why those low and high values weren't excluded from the beginning. What were they thinking? Deploying a system that advises people to break the speed limit or go ridiculously slow? Shouldn't that have been anticipated from the outset?

        • Comment removed based on user account deletion
      • While you're right that the sensor model is much easier than the parent post made it, I can think of a number of ways to improve the traffic flow, at least in poorly optimized places.

        Even open-loop, simply timing them well can help a lot. Take for instance, driving home in a typical commuter fashion, where most of the traffic is going the same direction as you are. It makes sense to sequence the lights so that a person driving at the speed limit who starts when one light turns green will be able to pass t

        • Re: (Score:3, Informative)

          by dcollins ( 135727 )

          Certainly not a full-on AI problem, just parameterize the flow density and flow rate and define a decent model and cost function, and run it through an NLP solver.

          Except that it's really a discrete problem, with a solution that likely has sensitive dependence on initial conditions (i.e., chaotic), and would result in symptoms such as "bus bunching": http://en.wikipedia.org/wiki/Bus_bunching [wikipedia.org]

          • Hmmm, if you smoothed out the data so that it was say, averaged over an hour, and force it to be continuous, could you get something going then. It wouldn't solve the stuck at a stoplight with no-one coming issue, but it could allow a smaller town to get a well-optimized system that adapts to changing patterns without having to have as many good traffic engineers on hand.

            Anyway, I could be totally wrong. Most of my work is in vehicle control, so I tend to try and force every problem to fit within my toolb

            • Re: (Score:3, Informative)

              by dcollins ( 135727 )

              Hmmm, if you smoothed out the data so that it was say, averaged over an hour, and force it to be continuous, could you get something going then.

              My point is that's about the worst assumption you can make.

        • The system I worked on had configured in Link Plans which are designed by engineers, taking into account speed limits, distance between traffic signals, etc. Heuristics are used to select the LP to be used for particular set of intersections. Within the LP other heuristics are used to vary the behaviour of signals across a region. For example an increase in the Degree of Saturation will result in an increase in Cycle Time. DS is how dense the traffic is. Cycle Time is the time a signal controller takes to g
    • red light violations are such a cash cow that municipalities won't put up with it.

    • by jd ( 1658 )

      Roundabouts are superior to traffic lights, in many respects. You don't hold up traffic at all, provided streaming is done right. The biggest problem is when they're used at small, infrequently-used intersections.

      Neural nets can do nothing that is non-computable and are not suited to all kinds of problems. Petri nets are also quite interesting, but again have a very specific role in AI.

      It has been shown that a single neuron from a physical brain can perform extremely complex operations. How is, as far as I

      • A roundabout is just an application of "give way to the right" (or left in a drive on the right country). We could save a lot of Give Way signs and space for roundabouts if drivers were taught to use that rule.

        Traffic signals are best used when you need to give some time to a low traffic road where it crosses a high traffic road. We have a few roundabouts here in Melbourne which are pseudo signalised during peak times. A long queue on an approach triggers a pedestrian crossing on the approach to the right
    • by 4D6963 ( 933028 )

      Yes, you see, that's just how neural nets work, you have a problem, throw a "neural net" at it and BAM it learns how to solve it and solves it!

      I once took two neural nets, gave one a huge folder of MP3s rated by how good their music is to me, so that the neural net can learn both how to decode MP3s (yeah, why bother decoding MP3s if the neural net can figure it out on its own) and learn how to make good music and churn out lots of original songs.

      I gave the second neural net the same folder of rated MP3s for

    • Re: (Score:3, Interesting)

      by Samah ( 729132 )
  • by dameepster ( 594651 ) * on Saturday April 18, 2009 @10:51PM (#27632783) Homepage

    Spammers are unlikely to share their results with the rest of the world. They're motivated by financial rewards, and there is absolutely no incentive to publicize their methodology in any format.

    Not only would the "good guys" learn from it -- and thus potentially defeat the spammers' discovery -- but other spammers would simply steal their work.

  • how about... (Score:5, Interesting)

    by inzy ( 1095415 ) on Saturday April 18, 2009 @10:53PM (#27632791)

    using spammers to create AI which allows us to catch/ignore/prevent spamming?

    • by jd ( 1658 )

      The spammers would add in backdoors that let through the spam they themselves generate.

    • "Before you can post on this webpage, which of the following messages is spam?"

  • by Jane Q. Public ( 1010737 ) on Saturday April 18, 2009 @10:53PM (#27632793)
    it has simply used existing OCR-type technology on a slightly (and I want to emphasize "slightly") different problem. Different character sets, if you will.
  • by Anonymous Coward on Saturday April 18, 2009 @11:01PM (#27632849)

    Replace captchas with pictures of hot/non-hot women.

    Simply ask "is this woman hot? [Yes]/[No]"

    Half of them will be so busy masturbating that they won't be cracking forms.

  • Not exactly (Score:2, Insightful)

    by Anonymous Coward

    I'm not as optimistic as the New Scientist. Spammers need a really low success rate, as compared to OCR technology which needs a really high success rate.

  • Make the problem too hard and the spammers will just hire people to crack it.

    It worked for captchas since they started out very easy and progressively got harder.

  • by Mr. Underbridge ( 666784 ) on Saturday April 18, 2009 @11:19PM (#27632947)

    Wherever there is greed, it can be harnessed to actually do some good. I love it!

    • Replace 'greed' with 'the opportunity to create new value', and your sentance makes sense.

    • Re: (Score:2, Funny)

      by Tablizer ( 95088 )

      Wherever there is greed, it can be harnessed to actually do some good. I love it!

      I never thought I'd say this on slashdot, but you need to watch more super-hero movies.
             

  • timothy (Score:2, Insightful)

    by timmarhy ( 659436 )
    you need to be slapped for using the term "crowdsourcing".
  • by iendedi ( 687301 ) on Saturday April 18, 2009 @11:33PM (#27633037) Journal
    All you have to do is put humans "in" the CAPTCHA interpretation logic, by way of a porn site. BOT -> PORN SITE -> SCRAPE REAL CAPTCHA AND PRESENT TO USER -> USER TYPES CAPTCHA TO SEE PORN -> BOT USES SOLUTION TO PASS REAL CAPTCHA

    Seems simple to me.
  • Re: (Score:2, Funny)

    Comment removed based on user account deletion
  • Resiliant software (Score:4, Interesting)

    by onyxruby ( 118189 ) <onyxruby&comcast,net> on Saturday April 18, 2009 @11:49PM (#27633153)
    You know, if legitimate software could ever learn how to make software as resilient as malware the world would be a better place. Modern malware is getting close to nuke proof. Delete registry keys, dll's, multiple self healing packages, msi source code, custom drivers, service restarts, redundant services, monitoring agents, update agents to ensure the latest upgrade and so on - and that's just what I saw a couple weeks ago on a relatives computer. Have you tried removing some of the latest malware w/o removing the disk and operating from a different computer? Unless you do you can't /really/ be sure it's been removed. Modern malware has the ability to incredibly resilient and bullet proof
  • Social engineering to improve society. That may be a first.
    • It's been going on for a long time. See Socrates, punk, rock'n'roll, Civil Rights, taxation, feminism, environmentalism and on and on and on.
  • This article assumes that the state of AI will be advanced. That won't happen unless the spammers share their research or code. I doubt that's going to happen.

  • by drolli ( 522659 ) on Sunday April 19, 2009 @12:38AM (#27633475) Journal

    My father, a nigerian spammer passed away. He left an AI system on a server located in a datacenter. Sadly during the last phase of his life unpaid data transfer bills accumulated to a sum of $300000. I am already negotiating with the secret services of the word who want to buy this program for $10000000. I can't pay the data transfer bills, so i turn to you, a trustworthy AI reasearcher. For $300000 you get a share of $500000000 and the copyright to the source code.

    sincerely yours,

  • Hasn't Recaptcha pretty much solved the captcha issue? Only words that OCR can't read are shown ... by definition!

    -P

    • Re: (Score:3, Informative)

      by sulliwan ( 810585 )
      You only have to get the word that OCR can recognize right. Just try guessing which of the two words OCR can't recognize and type some random gibberish instead of that word, it will let you through.
  • So the first computers to pass the turing test will do it by convincing some little-old-lady in Peoria that it's a deposed nigerian prince with money flow issues?

  • Are there any other problems that criminal crowdsourcing could help with?"

    Factoring prime numbers?

  • by solios ( 53048 )

    As somebody who either has an excruciatingly difficult time reading the damned things, or - more frequently - being completely unable to read them at all, I for one welcome the day when captchas are relegated to the dustbin of history.

    Seriously. I'd love to just be able to download porn without having to take a screenshot of the browser and then dick around in photoshop for a few minutes (brightness/contrast, pen tool, etc) in order for megaupload or whatever to let me get at the goodies.

    I have a hard eno

    • by BillX ( 307153 )

      Hey now, don't give porn sites and the money-grubbing companies that leech on them any ideas. Remember "Adult Check"? Pretty soon there will be a $19.95 a month "Human Check" service that verifies you're a human by your ability to pay your credit card bill each month (and maybe has an agent call/email you every few months with a brief quiz, kinda like they do in MMORPGs if they suspect a player is a bot).

    • by Tablizer ( 95088 )

      being completely unable to read them at all, I for one welcome the day when captchas are relegated to the dustbin of history.

      A similar escalation is happening to passwords. Our sys-admins are requiring digits, mixed caps, and punctuation in passwords now. Feels more and more like writing Perl code just to log in. I'm thinking of bringing in a Perl book to get ideas for passwords.
           

  • by blackest_k ( 761565 ) on Sunday April 19, 2009 @01:32AM (#27633809) Homepage Journal

    Trying to ensure only humans sign up for things is just a small part of a bigger problem.

    The other night I got javascripted away from the page i'd found in Google to watch a page pretend to put windows on my laptop and find malware, seen it many times before, i run ubuntu so seeing an xp like display of my c: and d: drives and various dll files being scanned isn't very convincing.

    I decided to look into why i'd landed on the original page. Google had the page as about no4 after my initial search, but the site was about 4 weeks old whys it ranked so high?

    And the answer is incoming links from around 86,000 pages according to google (links:domain.name)a lot of them are created internally passing links between malware site to malware site. But the majority come from sites using php forms which add user posts to the the sites pages.

    A number of months ago i found my sites contact forms were sending a lot of garbage emails to me absolutely stuffed with urls and I wondered why bother doing this since i'm not going to visit the sites. anyway the cure was to only allow the forms to be processed with no more than a few urls in them. stopped the junk hitting the inbox. It's not stopped the automated posting but the forms are not processed and i don't get them any more.

    When I examined the links to the malware site i found php posted user posts packed with links just like my emails had been the difference being these were posted published and being crawled. Because of these links a site with less than 4 weeks life is ranked highly because of the quantity of inbound links and thats why I got to watch a display of XP like virus and malware scanning,

    I also examined the content of the pages of the original malware site and the subjects varied quite widely but they also seemed to have a relation with the trends that google was showing for related keywords in the weeks before the site went live. I've a feeling that the pages were generated by pulling content from legitimate sites that ranked high in the natural search.

    I guess site owners tend to think these links are to spam porn at their users but its not its so google will promote the malware sites with gamed page rank.

    Clever isn't it
    find good key phrases (may be just using google trends)
    scrape content from legit sites and mashup
    create massive array of links to site.
    wait for the fish to arrive and scam them.

    The Antivirus scam is antivirus2009 but you only get shown it once
    heres a link for details on removing it and some interesting details.

    http://www.2-spyware.com/remove-antivirus-2009.html [2-spyware.com]

    Thing is the third party linking sites were using captchas but the real problem was not filtering the posts if a suitable max number of url's were used the posts would fail and the pagerank gaming would too.

    Fixing the broken php and cgi scripts is whats really needed not just a better captcha
    The Captcha is just a BandAid on a deeper problem and webmasters need to deal with the issues.

    • Re: (Score:3, Informative)

      And the answer is incoming links from around 86,000 pages according to google (links:domain.name)a lot of them are created internally passing links between malware site to malware site. But the majority come from sites using php forms which add user posts to the the sites pages. A number of months ago i found my sites contact forms were sending a lot of garbage emails to me absolutely stuffed with urls and I wondered why bother doing this since i'm not going to visit the sites. anyway the cure was to only

  • by Anenome ( 1250374 ) on Sunday April 19, 2009 @01:35AM (#27633823)

    Do you think that we could use sterograms as a new form of Captcha? A sterogram uses the deep structures of the brain in a way completely different from mere character recognition in order to derive depth from an image. How hard would it be for a computer program to derive 3D information from a stereogram and make sense out of it? Wouldn't spammers essentially have to solve a much-harder vision problem, that of depth perception, than CAPTCHAs OCR solution?

    For the uninitiated: http://en.wikipedia.org/wiki/Stereogram [wikipedia.org]

    For a sample stereogram along with a picture of what you will see when done correctly (as shown by a B&W heightmap): http://en.wikipedia.org/wiki/File:Stereogram_Tut_Random_Dot_Shark.png [wikipedia.org]

    • by adnonsense ( 826530 ) on Sunday April 19, 2009 @01:53AM (#27633905) Homepage Journal

      What about people like me who can't seem to get the hang of the darn things? (I personally wouldn't be surprised if they're some kind of elaborate hoax...)

      • Maybe your brain doesn't even process depth and you don't even realize it, poor sap :P

        • Well they certainly make me go cross-eyed looking at all the pretty colours.
          • Seriously though, some of them can be challenging to line-up, and some are extremely easy. As a rule, the random dot ones are going to be easier to line-up than the photographic ones simply because you don't have to cross your eyes as much. If you can make the two dots turn into three dots, then you've done it. All that's left is to stabilize your eyes at that depth by pausing for a few moments and holding those three dots and then trying to notice what's below, and if you lose it you go back up to the dots

    • I'd be enthusiastic if i could actually see these hidden images. Even knowing what they are doesn't help

    • Re: (Score:3, Informative)

      by hankwang ( 413283 ) *

      How hard would it be for a computer program to derive 3D information from a stereogram and make sense out of it?

      Converting the stereogram into a depth map: not very hard I think; at least, easier than for most humans. You look for repeating patterns along horizontal lines. Depending on whether the pattern repeats itself squeezed or stretched, it corresponds to negative or positive depth changes. The next problem is interpreting the depth map as an image to answer the captcha challenge (Q: what do you see he

  • Are there any other problems that criminal crowdsourcing could help with?

    Hmm, 'My lack of money' comes to mind. Any takers? No? ...please? ;__;

  • Even for an advanced AI of the XXV century as Data was pretty hard to discern when something was funny or not.

    And if they manage to make an AI that recognize and enables to discern or even make always funny jokes we will be so amused that wont worry about spam anymore. Mmm... maybe they already did [cracked.com]

  • Do my laundry and I'll set you up with a gmail account.

It is easier to write an incorrect program than understand a correct one.

Working...