Looking To Spammers To Solve Hard AI Problems 271
An anonymous reader writes "With bots getting closer to beating text-based CAPTCHAs for good, New Scientist points out that when they do, OCR technology will at least have advanced. The article goes on to suggest that whatever kind of reverse Turing Test that comes next should be chosen to motivate spammers to solve other pressing AI problems, such as image recognition. Are there any other problems that criminal crowdsourcing could help with?"
It was supposed to happen. (Score:5, Interesting)
Advancing the state of the art in Optical Character Recognition was always intended to be a side-benefit of CAPTCHAs. It looks like that plan came through nicely.
I have always figured CAPTCHAs would be a stopgap until other methods of authentication could easily be used, such as micro-payments or single signon solutions like OpenID. Unfortunately, those other methods haven't been adopted nearly as fast as the need. Perhaps if CAPTCHAs are declared "dead", site operators will feel more urgency to adopt these solutions.
If CAPTCHAs do continue, I'd like the next problem to be facial recognition software. I'd love a package that could look at a picture and tag it "Nicholas and Andrea" or "Glen and Helene". Digital camera software everywhere could benefit from this technology. Not sure how you'd bake that into a CAPTCHA, but it's a good problem to solve.
Re: (Score:3, Funny)
If CAPTCHAs do continue, I'd like the next problem to be facial recognition software. I'd love a package that could look at a picture and tag it "Nicholas and Andrea" or "Glen and Helene".
Given the likeliness of Linux being the test platform, this will work for female genitalia first.
Re: (Score:2, Funny)
One of the chans has "fapchas"; you have to identify various body parts by typing "tits", "vagoo", "dick", or "ass"
Re: (Score:2)
Re: (Score:3, Informative)
Picasa is a popular image management program that has supported facial recognition since last year: http://www.techcrunch.com/2008/09/02/picasa-refresh-brings-facial-recognition/ [techcrunch.com]
I havnt used it, so im not sure how good it is.
Re: (Score:2)
Micro-payments have not caught on because the banking industry does not support them, nor should they. You are asking one industry to foot the expenses for something that has nothing to do with them.
OpenID has not caught on faster for at least one major reason: Single Point of Failure. The very same reason people are exhorted to used different passwords on different sit
Re: (Score:3, Interesting)
I hope the next scheme is easier for people with bad eyes. I often have to call the wife or one of my sons to solve a captcha puzzle. The black/white/grey are bad enough - when they combine colors, I'm freaking LOST!! If I'm home alone, I just give up after a couple failed attempts. Good thing my bank doesn't use this scheme, huh?
Re: (Score:2)
Captchas are really a problem. They cause serious accessibility issues, and many i can't solve myself having good sight and a large crystal clear lcd screen.
In your case i think you should try compiz (Called Desktop Effects in Ubuntu) for aid: There is a plugin which inverts colors, another does variable level smooth zoom which could follow your mouse, etc. Just make sure you have compizconfig-settings-manager to turn on the useful stuff and get rid of the rest.
Re: (Score:2)
Re:It was supposed to happen. (Score:5, Insightful)
Unfortunately, CAPTCHA is radically easier than actual OCR. When cracking a CAPTCHA, achieving a success rate of 5-10% is absolutely fine. Plus, when you submit your answer, you are told whether or not you got it right. With OCR, anything short of high 90's is pretty much useless, and the only feedback available is through manual human intervention, which scales poorly.
Arguably, the only significant OCR advance has been RECAPTCHA, which is just a clever way of making humans do the hard stuff in a way that actually helps, rather than just using makework problems.
It is certainly true that CAPTCHA cracking has advanced considerably, that just doesn't apply too neatly to real OCR problems.
Re:It was supposed to happen. (Score:5, Interesting)
How about this: The user is presented with a short message that they have to mark as "Spam" or "Not Spam". If the spammers get really good at solving this problem, they've effectively written themselves out of a job. And if they can't do it, then they can't get new accounts.
Re:It was supposed to happen. (Score:5, Interesting)
I like it, but it has issues that may be hard to work out.
(1) If they only needed to solve one (or any small number), then the spammer's auto system will only need to guess. Present the potential user with 3 of these and they'll get fed up. The spammer's system, on the other hand, will get 11% correct by guessing. That's enough for them to thwart the system.
(2) It's really easy to get samples of spam. Any user who clicks the spam button has stated that it's not their mail. (Multiple users flagging the same message tell you it's practically certain to be spam) It's not a huge stretch to acquire or assume permission to use the message. Getting legitimate samples (of varieties of email) may be much harder.
Re: (Score:2)
How about this: The user is presented with a short message that they have to mark as "Spam" or "Not Spam". If the spammers get really good at solving this problem, they've effectively written themselves out of a job. And if they can't do it, then they can't get new accounts.
That's freakin' genius!With the caveat that spammers will never completely destroy themselves - the situation will reach a balance, but at a lower threshold than today. Spam will certainly decrease dramatically, if your idea is to be implemented.
Re: (Score:3, Informative)
Actually this exists: http://spamornot.org/ [spamornot.org]
Re: (Score:2)
CAPTCHAs need to die (Score:2)
I don't know what the replacement should be, but I'd love to see the end of CAPTCHAs. The last time I tried to sign up a gmail account, it took me four tries just to read the bloody CAPTCHA correctly - if we've got to the point where the spammers can parse the CAPTCHA and a human can't, there's no point to them.
Re:It was supposed to happen. (Score:4, Informative)
Re: (Score:2, Informative)
Re:It was supposed to happen. (Score:4, Funny)
How about a "hot or not" test? How good are computers at deciding if somebody is hot or not?
(Yeah, it's a joke, I understand the statistical implications of multiple-choice Turing tests).
Re: (Score:2)
How about spotting and correcting errors in wikipedia?
The only problem would be vandel-bots who would correct their own errors :s
Re:It was supposed to happen. (Score:4, Interesting)
since beauty seems to be largely evaluated on symmetry and ratios of various parts of the face and body relative to other parts and existing facial recognition systems already work by measuring distances and ratios between those points, I don't think that would be all that hard.
Re: (Score:2)
10 x 0% accuracy = ? ;D
Re: (Score:3, Interesting)
Nice going, you just invented the tiered net (Score:5, Insightful)
What about people for who $50 is a year salary? Congrats, you just split the internet into the rich and the poor. No more accessing the internet from africa from an old PC powered by a donated solar cell. Good job. You probably going to get a nobel price.
Re: (Score:3, Insightful)
You messed up. CAPCHA is not a test to tell if your viewers have any money. It is just a test if they are a human or computer.
Actually, CAPTCHA is usually a test to see if the viewer can read English. The biggest problem with reCAPTCHA is that all of the words are English.
I can't imagine it'd have anywhere near the success it's seen if it were trying to get you to do OCR for Japanese, or even Polish...
Re: (Score:3, Insightful)
Comment removed (Score:5, Informative)
Good to see I'm not the only one... (Score:2)
Good to see I'm not the only one having trouble reading some of the latest captchas.
It's time for a rethink when the humans start to fail at it.
Re: (Score:3, Informative)
Micro payments are terrible ideas, first because it violates basic net neutrality principles...
Methinks you have no idea what "net neutrality" actually means. What does paying to post on a forum have to do with net neutrality?
Re: (Score:3, Funny)
But that would mean that any self-important twat couldn't just freely post his own egotistical rantings online for everyone to see. I won't stand for it!
SSSHHH!!!! (Score:3, Funny)
Don't tell them that they're the ones that are actually being used! That spoils all the fun!
True AI (Score:5, Funny)
I'll just bet that this is what leads to "true" artificial intelligence (whatever that is). Soon, we'll have completely automated agents trying to convince other completely automated agents to purchase stuff to enhance bits of biology that they don't have.
Re:True AI (Score:5, Insightful)
This is a reasonably accurate description of the stock market.
Re: (Score:2)
This is a reasonably accurate description of the stock market.
Even the penis enhancing part?
That strikes me as bizarre.
Re: (Score:2)
Since AIs are, almost by definition, more predictable than humans, it is self-evident that AI customers can be more cost-effectively be tailored for and more easily swayed. Since limited intelligences already handle most financial decisions (virtually no humans actually play the stock markets these days), it is the AIs who have the serious money and therefore are the customers of choice.
If we go down that sort of a road, with spammers and crackers controlling research and development, humans will cease to b
Re: (Score:3, Funny)
Haven't you seen the markov chain generator posts? I haven't seen the markov chain generator posts? I could be one myself. Haven't you seen the markov chain generator posts? I could be one myself. Haven't you seen the markov chain generator posts? I haven't seen them from genuine posts. In point of fact, I haven't seen the markov chain generator posts? I could be one myself. Haven't you seen them from genuine posts. In point of fact, I can't tell them from genuine posts. In point of fact, I haven't seen the
Re: (Score:2)
Most Slashdot posters are already using the random research paper generator for the articles and the abstract for the replies.
Re: (Score:2, Insightful)
a possible idea (Score:4, Insightful)
several years ago 'neural nets' were the big thing and they were thinking that they could make them 'learn' and do useful things.
i always thought that traffic control would be an interesting application. if a computer could look at video of an intersection (and streets leading to the intersection) and figure out where cars were and weren't, you could make traffic lights a lot less annoying.
so our CAPTCHA might be a picture/video of cars and a request to count them?
eric
Re: (Score:2)
Is that really that hard a problem? With a combination of road sensors and the ability to distinguish between a known background and other colors, that should be pretty easy. The problem is what to do with that information. The only time traffic lights are annoying in a way that could be easily remedied is a 3 am, when it's a pain to get stuck at a red when there are no cars visible in any direction. But that's not an important case; what really matters is making traffic flow smoothly when volume is heavy.
Re:a possible idea (Score:5, Interesting)
The actual algorithms which determined the timing of the signals was hand assembled by traffic engineers in 12 bit PDP/11 machine code, so it was impossible to know exactly how it worked.
Maybe that system was intelligent. It certainly had a lot of emergent properties.
Re: (Score:2)
The problem with that was that it gave really bad, but accurate advice, like travel at 12km/h or 80km/h. This is where the limit is 60. So they changed it to only display speeds below and close to the limit and then it was even more useless.
What I don't understand is why those low and high values weren't excluded from the beginning. What were they thinking? Deploying a system that advises people to break the speed limit or go ridiculously slow? Shouldn't that have been anticipated from the outset?
Re: (Score:2)
Re: (Score:2)
While you're right that the sensor model is much easier than the parent post made it, I can think of a number of ways to improve the traffic flow, at least in poorly optimized places.
Even open-loop, simply timing them well can help a lot. Take for instance, driving home in a typical commuter fashion, where most of the traffic is going the same direction as you are. It makes sense to sequence the lights so that a person driving at the speed limit who starts when one light turns green will be able to pass t
Re: (Score:3, Informative)
Certainly not a full-on AI problem, just parameterize the flow density and flow rate and define a decent model and cost function, and run it through an NLP solver.
Except that it's really a discrete problem, with a solution that likely has sensitive dependence on initial conditions (i.e., chaotic), and would result in symptoms such as "bus bunching": http://en.wikipedia.org/wiki/Bus_bunching [wikipedia.org]
Re: (Score:2)
Hmmm, if you smoothed out the data so that it was say, averaged over an hour, and force it to be continuous, could you get something going then. It wouldn't solve the stuck at a stoplight with no-one coming issue, but it could allow a smaller town to get a well-optimized system that adapts to changing patterns without having to have as many good traffic engineers on hand.
Anyway, I could be totally wrong. Most of my work is in vehicle control, so I tend to try and force every problem to fit within my toolb
Re: (Score:3, Informative)
Hmmm, if you smoothed out the data so that it was say, averaged over an hour, and force it to be continuous, could you get something going then.
My point is that's about the worst assumption you can make.
Re: (Score:2)
Re: (Score:2)
red light violations are such a cash cow that municipalities won't put up with it.
Re: (Score:2)
Roundabouts are superior to traffic lights, in many respects. You don't hold up traffic at all, provided streaming is done right. The biggest problem is when they're used at small, infrequently-used intersections.
Neural nets can do nothing that is non-computable and are not suited to all kinds of problems. Petri nets are also quite interesting, but again have a very specific role in AI.
It has been shown that a single neuron from a physical brain can perform extremely complex operations. How is, as far as I
Re: (Score:2)
Traffic signals are best used when you need to give some time to a low traffic road where it crosses a high traffic road. We have a few roundabouts here in Melbourne which are pseudo signalised during peak times. A long queue on an approach triggers a pedestrian crossing on the approach to the right
Re: (Score:3, Interesting)
*loads thoughts into blunderbus, scatters them over landscape*
Seriously, we know the following from experimental science:
a) Rats are capable of flying F-14s
b) African Grey parrots are capable of basic grammar, understand attributes as distinct from objects, and comprehend zero.
c) Crows can solve basic problems and manufacture their own tools
These do not have significant development in the brain areas associated with processing data, but they DO have exceptionally well-developed brains for handling raw senso
Re: (Score:2)
Yes, you see, that's just how neural nets work, you have a problem, throw a "neural net" at it and BAM it learns how to solve it and solves it!
I once took two neural nets, gave one a huge folder of MP3s rated by how good their music is to me, so that the neural net can learn both how to decode MP3s (yeah, why bother decoding MP3s if the neural net can figure it out on its own) and learn how to make good music and churn out lots of original songs.
I gave the second neural net the same folder of rated MP3s for
Re: (Score:3, Interesting)
http://thedailywtf.com/Articles/No,_We_Need_a_Neural_Network.aspx [thedailywtf.com]
But will they share their code? (Score:5, Insightful)
Spammers are unlikely to share their results with the rest of the world. They're motivated by financial rewards, and there is absolutely no incentive to publicize their methodology in any format.
Not only would the "good guys" learn from it -- and thus potentially defeat the spammers' discovery -- but other spammers would simply steal their work.
Re:But will they share their code? (Score:5, Informative)
Spammers sell their code to other spammers all the time.
how about... (Score:5, Interesting)
using spammers to create AI which allows us to catch/ignore/prevent spamming?
Re: (Score:2)
The spammers would add in backdoors that let through the spam they themselves generate.
Re: (Score:3)
"Before you can post on this webpage, which of the following messages is spam?"
Busting captchas has not advanced anything... (Score:3, Informative)
Re:Busting captchas has not advanced anything... (Score:5, Insightful)
Re: (Score:3, Informative)
I would agree, if general-purpose captcha-beating software were available. But that isn't so. Each captcha system was beaten by custom code, individually written for that system. So in effect, it is not much different than adding a new font to existing OCR software.
Most of them don't actually beat the captcha with a program. This [getafreelancer.com] is how it gets done.
Re: (Score:2)
Beat them with sex (Score:3, Funny)
Replace captchas with pictures of hot/non-hot women.
Simply ask "is this woman hot? [Yes]/[No]"
Half of them will be so busy masturbating that they won't be cracking forms.
Not exactly (Score:2, Insightful)
I'm not as optimistic as the New Scientist. Spammers need a really low success rate, as compared to OCR technology which needs a really high success rate.
Need incremental problems (Score:2)
Make the problem too hard and the spammers will just hire people to crack it.
It worked for captchas since they started out very easy and progressively got harder.
Capitalism at its best (Score:3, Insightful)
Wherever there is greed, it can be harnessed to actually do some good. I love it!
Re: (Score:2)
Replace 'greed' with 'the opportunity to create new value', and your sentance makes sense.
Re: (Score:2, Funny)
I never thought I'd say this on slashdot, but you need to watch more super-hero movies.
timothy (Score:2, Insightful)
The CAPTCHA problem is an easy one. (Score:3, Interesting)
Seems simple to me.
Re: (Score:2, Funny)
Resiliant software (Score:4, Interesting)
Re: (Score:2)
Thats when you turn them to Ubuntu (or equivalent), and stop servicing windows altogether :)
Re: (Score:2)
Yeah, because it would be hilarious not being able to uninstall Apple Updater or QuickTime... Wait, what?
How about that.. (Score:2)
Re: (Score:2)
They won't share their advancements (Score:2)
This article assumes that the state of AI will be advanced. That won't happen unless the spammers share their research or code. I doubt that's going to happen.
Dear Friend, (Score:5, Funny)
My father, a nigerian spammer passed away. He left an AI system on a server located in a datacenter. Sadly during the last phase of his life unpaid data transfer bills accumulated to a sum of $300000. I am already negotiating with the secret services of the word who want to buy this program for $10000000. I can't pay the data transfer bills, so i turn to you, a trustworthy AI reasearcher. For $300000 you get a share of $500000000 and the copyright to the source code.
sincerely yours,
Recaptcha (Score:2)
Hasn't Recaptcha pretty much solved the captcha issue? Only words that OCR can't read are shown ... by definition!
-P
Re: (Score:3, Informative)
Turing Tests (Score:2)
So the first computers to pass the turing test will do it by convincing some little-old-lady in Peoria that it's a deposed nigerian prince with money flow issues?
A really hard problem... (Score:2)
Are there any other problems that criminal crowdsourcing could help with?"
Factoring prime numbers?
Re: (Score:2, Funny)
Re: (Score:2)
Good. (Score:2)
As somebody who either has an excruciatingly difficult time reading the damned things, or - more frequently - being completely unable to read them at all, I for one welcome the day when captchas are relegated to the dustbin of history.
Seriously. I'd love to just be able to download porn without having to take a screenshot of the browser and then dick around in photoshop for a few minutes (brightness/contrast, pen tool, etc) in order for megaupload or whatever to let me get at the goodies.
I have a hard eno
Re: (Score:2)
Hey now, don't give porn sites and the money-grubbing companies that leech on them any ideas. Remember "Adult Check"? Pretty soon there will be a $19.95 a month "Human Check" service that verifies you're a human by your ability to pay your credit card bill each month (and maybe has an agent call/email you every few months with a brief quiz, kinda like they do in MMORPGs if they suspect a player is a bot).
Re: (Score:2)
A similar escalation is happening to passwords. Our sys-admins are requiring digits, mixed caps, and punctuation in passwords now. Feels more and more like writing Perl code just to log in. I'm thinking of bringing in a Perl book to get ideas for passwords.
Ignoring the real problem. (Score:5, Interesting)
Trying to ensure only humans sign up for things is just a small part of a bigger problem.
The other night I got javascripted away from the page i'd found in Google to watch a page pretend to put windows on my laptop and find malware, seen it many times before, i run ubuntu so seeing an xp like display of my c: and d: drives and various dll files being scanned isn't very convincing.
I decided to look into why i'd landed on the original page. Google had the page as about no4 after my initial search, but the site was about 4 weeks old whys it ranked so high?
And the answer is incoming links from around 86,000 pages according to google (links:domain.name)a lot of them are created internally passing links between malware site to malware site. But the majority come from sites using php forms which add user posts to the the sites pages.
A number of months ago i found my sites contact forms were sending a lot of garbage emails to me absolutely stuffed with urls and I wondered why bother doing this since i'm not going to visit the sites. anyway the cure was to only allow the forms to be processed with no more than a few urls in them. stopped the junk hitting the inbox. It's not stopped the automated posting but the forms are not processed and i don't get them any more.
When I examined the links to the malware site i found php posted user posts packed with links just like my emails had been the difference being these were posted published and being crawled. Because of these links a site with less than 4 weeks life is ranked highly because of the quantity of inbound links and thats why I got to watch a display of XP like virus and malware scanning,
I also examined the content of the pages of the original malware site and the subjects varied quite widely but they also seemed to have a relation with the trends that google was showing for related keywords in the weeks before the site went live. I've a feeling that the pages were generated by pulling content from legitimate sites that ranked high in the natural search.
I guess site owners tend to think these links are to spam porn at their users but its not its so google will promote the malware sites with gamed page rank.
Clever isn't it
find good key phrases (may be just using google trends)
scrape content from legit sites and mashup
create massive array of links to site.
wait for the fish to arrive and scam them.
The Antivirus scam is antivirus2009 but you only get shown it once
heres a link for details on removing it and some interesting details.
http://www.2-spyware.com/remove-antivirus-2009.html [2-spyware.com]
Thing is the third party linking sites were using captchas but the real problem was not filtering the posts if a suitable max number of url's were used the posts would fail and the pagerank gaming would too.
Fixing the broken php and cgi scripts is whats really needed not just a better captcha
The Captcha is just a BandAid on a deeper problem and webmasters need to deal with the issues.
Re: (Score:3, Informative)
How About Using Stereograms? (Score:3, Interesting)
Do you think that we could use sterograms as a new form of Captcha? A sterogram uses the deep structures of the brain in a way completely different from mere character recognition in order to derive depth from an image. How hard would it be for a computer program to derive 3D information from a stereogram and make sense out of it? Wouldn't spammers essentially have to solve a much-harder vision problem, that of depth perception, than CAPTCHAs OCR solution?
For the uninitiated: http://en.wikipedia.org/wiki/Stereogram [wikipedia.org]
For a sample stereogram along with a picture of what you will see when done correctly (as shown by a B&W heightmap): http://en.wikipedia.org/wiki/File:Stereogram_Tut_Random_Dot_Shark.png [wikipedia.org]
Re:How About Using Stereograms? (Score:4, Insightful)
What about people like me who can't seem to get the hang of the darn things? (I personally wouldn't be surprised if they're some kind of elaborate hoax...)
Re: (Score:2)
Maybe your brain doesn't even process depth and you don't even realize it, poor sap :P
Re: (Score:2)
Re: (Score:2)
Seriously though, some of them can be challenging to line-up, and some are extremely easy. As a rule, the random dot ones are going to be easier to line-up than the photographic ones simply because you don't have to cross your eyes as much. If you can make the two dots turn into three dots, then you've done it. All that's left is to stabilize your eyes at that depth by pausing for a few moments and holding those three dots and then trying to notice what's below, and if you lose it you go back up to the dots
Re: (Score:2)
I'd be enthusiastic if i could actually see these hidden images. Even knowing what they are doesn't help
Re: (Score:3, Informative)
Converting the stereogram into a depth map: not very hard I think; at least, easier than for most humans. You look for repeating patterns along horizontal lines. Depending on whether the pattern repeats itself squeezed or stretched, it corresponds to negative or positive depth changes. The next problem is interpreting the depth map as an image to answer the captcha challenge (Q: what do you see he
I have some ideas... (Score:2)
Hmm, 'My lack of money' comes to mind. Any takers? No? ...please? ;__;
Funny or not? (Score:2)
Even for an advanced AI of the XXV century as Data was pretty hard to discern when something was funny or not.
And if they manage to make an AI that recognize and enables to discern or even make always funny jokes we will be so amused that wont worry about spam anymore. Mmm... maybe they already did [cracked.com]
laundry (Score:2)
Do my laundry and I'll set you up with a gmail account.