Stopping Spambots: A Spambot Trap 312
Neil Gunton writes "Having been hit by a load of spambots on my community site, I decided to write a Spambot Trap which uses Linux, Apache, mod_perl, MySQL, ipchains and Embperl to quickly block spambots that fall into the trap. "
Re:Elements of good design I'd missed (Score:1, Insightful)
But neither could blind internet users...
Re:problem with not giving an email address ... (Score:2, Insightful)
The only problem with the idea of using entirely http based "send me a message systems" is that some people, like myself, would much rather have an actual email address to use instead of having to use 50 different layouts and 50 different configurations and 50 different methods of communicating with someone or a company. Every html based contact system has its own quirks and problems, I'd rather just need to learn my email programs issues instead.
Re:Elements of good design I'd missed (Score:3, Insightful)
You're 100% right. And fighting against spambots by relying on UserAgent is akin to... well.... security thru obscurity, albeit somehow in reverse.
What also looks strange is that he doesn't consider that one can get a link directly to a page on the n-th level: as human browsers don't usually download robots.txt either, sounds like he's gonna ban some poor guys who got a link from a friend...
Re:Block? Are you kidding? (Score:3, Insightful)
Add a couple of sleep(20); into the cgi script that generates the bot fodder. The bot will still stay busy waiting for your webserver's response, but your script will exactly consume zero resources.
For additional kicks, set up a DNS teergrube.
A better solution: obfuscate the mailto: link (Score:5, Insightful)
(Yes, I've posted about this before [slashdot.org], but it does work for me.) Browsers render it so users get the address they want, but spambots try to grab it from the raw html and get something meaningless.
Re:Simple solution! (Score:3, Insightful)
using images is bad for people with text browsers (Score:2, Insightful)
And of course if he uses ALT text for the images, then he has the same problem he was trying to avoid, of creating something the spambots can read.
Re:Block? Are you kidding? (Score:3, Insightful)
No way, man.
If you realize you're serving to a bot, go on serving. Each time the bot follows the "next page" link, you
Give it thousands, millions of addresses this way.
This would be good to do with known bad addresses, but random addresses only add more unknowing people to the list. You may add 1000 email addresses to the list and slow them down, but if even 10 of those email addresses are real, you've added to the problem. The bad addresses will be taken out as they are found to be bad, and the good ones will be left in. You've signed JoeRandomUser@RandomDomain.com up for all the spam he can handle, even if he has taken great lengths to keep his email address off the spam lists. In theory this sounds like a great idea, until your the guy getting your email address randomly fed to the bots.
Re:Pollute their database (Score:2, Insightful)
Think about it. With the scarcity of domain names lately, chances are that while the garbage email addresses may not be valid, more than a few domain names would be valid.
So then the spammer fills his database with these non-existant addresses on existing domain names. He then sends his spam to these addresses, and their mail servers not only have to process the message to determine that it's an invalid address, but they also have to bounce the message back as undeliverable.
IMO this is going to use twice the bandwidth, since you now have to consider the bandwidth used by all of those bounces.
You could always use some non-existant domain names for the garbage email addresses, but the spammer could just as easily check a domain name's validity before sending spam to it, making it trivial to remove all of the trash from his database.
Remember, the spammer couldn't care less about sending mail to bad addresses, as long as the good addresses are spammed as well. It's left to the poor sysadmin to clean up the mess.
Re:Block? Are you kidding? (Score:3, Insightful)
Way too much work. Here's similar Escapade [escapade.org] code:
Not similar enough. That makes 300 queries per hit against your database, and I don't think you even used prepared statements. His code slowed their software to a crawl by sleeping. Yours will slow your software to a crawl by excessive database traffic.