Stopping SpamBots With Apache 55
primetyme writes: "Sick of email harvesting spam robots cruising your Apache based site? Here's an in depth article that shows one way you can configure a base Apache installation to keep those nasty bots of your site - and the spam out of your Inbox." Anything that helps annoy spammers is a good thing.
One Way... (Score:3, Funny)
A better way! (Score:1, Interesting)
president@whitehouse.gov, abuse@127.0.0.1, some MAP adress.
Re:A better way! (Score:2)
worthlessPOS@taliban.gov, ROOTofallevil@taliban.gov, some MAP address
Re:A better way! (Score:1)
That's the complaint address at the Federal Trade Commission for spam; granted, intelligent Email harvesters would check for and discard that address (sending spam to it would be tantamount to turning yourself in), but not all spammers meet the intelligence test
Re:A better way! (Score:2)
This is incorrect. You want to use abuse@[127.0.0.1] as the address.
Another way to stop spam on your webserver... (Score:2, Funny)
Now I guess I am off to hack (Score:2)
What is an apache admin to do, it is so configurable there doesn't appear to be anything that it can't do. What is next using apache to brew my morning coffee (well there is the coffee pot cam - anyone know what webserver it ran on) write my website for me, solve world hunger ???
WHY WHY WHY do people run IIS anyway, I would love to see what it would take to do this with IIS, any takers ?
Re:Now I guess I am off to hack (Score:2, Funny)
Hey, Emacs has to be good for something, right?
Re:Now I guess I am off to hack (Score:2)
You could write an ISAPI filter that intercepts the requests just before IIS processes them.
It won't work long (Score:3, Insightful)
Using some client side Javascript would be harder for them to deal with (although if your browser can view it they will be able to also).
I guess graphics would be next...
Re:It won't work long (Score:1)
Checking the user agent won't work for long - how hard will it be for the spammers to change the user agent to "Mozilla..."
Certainly not as hard as convincing every web site on the net that displays email addresses to dick around with this..
Also useful for... (Score:5, Informative)
Here is a longer list of common spam bots and mirror bots that I have been able to find:
SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "CherryPickerSE" bad_bot
SetEnvIfNoCase User-Agent "CherryPickerElite" bad_bot
SetEnvIfNoCase User-Agent "Crescent" bad_bot
SetEnvIfNoCase User-Agent "EmailCollector" bad_bot
SetEnvIfNoCase User-Agent "EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "MCspider" bad_bot
SetEnvIfNoCase User-Agent "bew" bad_bot
SetEnvIfNoCase User-Agent "Deweb" bad_bot
SetEnvIfNoCase User-Agent "FEZhead" bad_bot
SetEnvIfNoCase User-Agent "Fetcher" bad_bot
SetEnvIfNoCase User-Agent "Getleft" bad_bot
SetEnvIfNoCase User-Agent "GetURL" bad_bot
SetEnvIfNoCase User-Agent "HTTrack" bad_bot
SetEnvIfNoCase User-Agent "IBM_Planetwide" bad_bot
SetEnvIfNoCase User-Agent "KWebGet" bad_bot
SetEnvIfNoCase User-Agent "Monster" bad_bot
SetEnvIfNoCase User-Agent "Mirror" bad_bot
SetEnvIfNoCase User-Agent "NetCarta" bad_bot
SetEnvIfNoCase User-Agent "OpaL" bad_bot
SetEnvIfNoCase User-Agent "PackRat" bad_bot
SetEnvIfNoCase User-Agent "pavuk" bad_bot
SetEnvIfNoCase User-Agent "PushSite" bad_bot
SetEnvIfNoCase User-Agent "Rsync" bad_bot
SetEnvIfNoCase User-Agent "Shai" bad_bot
SetEnvIfNoCase User-Agent "Spegla" bad_bot
SetEnvIfNoCase User-Agent "SpiderBot" bad_bot
SetEnvIfNoCase User-Agent "SuperBot" bad_bot
SetEnvIfNoCase User-Agent "tarspider" bad_bot
SetEnvIfNoCase User-Agent "Templeton" bad_bot
SetEnvIfNoCase User-Agent "WebCopy" bad_bot
SetEnvIfNoCase User-Agent "WebFetcher" bad_bot
SetEnvIfNoCase User-Agent "WebMiner" bad_bot
SetEnvIfNoCase User-Agent "webvac" bad_bot
SetEnvIfNoCase User-Agent "webwalk" bad_bot
SetEnvIfNoCase User-Agent "w3mir" bad_bot
SetEnvIfNoCase User-Agent "XGET" bad_bot
SetEnvIfNoCase User-Agent "Wget" bad_bot
SetEnvIfNoCase User-Agent "WebReaper" bad_bot
SetEnvIfNoCase User-Agent "WUMPUS" bad_bot
SetEnvIfNoCase User-Agent "FAST-WebCrawler" bad_bot
You can't win an arms race (Score:5, Insightful)
On the other hand, there are ways to fight spambots; they just don't rely on trusting the user. Here's one way:
There are good ways to deal with spammers but this isn't one of them. It *might* work on a small scale and it definitely won't work on a medium or large scale. It's about as useful as the Sendmail "MX/domain validation" trick that Eric Raymond and the rest of the Sendmail team thought would stop spammers dead in its tracks. (It didn't.) Instead he was "surprised by spam."
-CT
Re:You can't win an arms race (Score:3, Insightful)
In the next installment of this article, I'm working on a script that grabs the NetBlock of a bot that goes against the robots.txt file, does a ARIN lookup on that block, and emails the administrator of that block with the prob.. Comments have been made that any bot can switch their user-agent string, which is true. If a Spidert does that though, they're more than likely also going to run through the parts of a site that you *specifically* tell them they can't go in the robots.txt file. When they do that, its a lot easier to block their user-agent, email the admin of thier netblock, or block their class c IP block alltogether.
It's like a honeypot for black-hats if you think about it.. And thats one of the *best* ways to find the problem Spiderts and block them out, without blocking any good natured bot :)
Re:You can't win an arms race (Score:1, Interesting)
(posting anonymously so as to not tell spambots what tech I'm using on my site)
Re:You can't win an arms race (Score:1)
Wget is not a spider! (Score:4, Informative)
"Here are a couple of the User-Agents that fell for our trap that I pulled out of last months access_log for lists.evolt.org:
Wget/1.6"
Email spider, my ass! Wget is a damn useful HTTP downloader utility which is great for obtaining large files as it can resume interrupted transfers. It can also mirror web sites, which I assume is why it fell into the honeypot. Oh, and you can also change what it says it is on the command line.
And to add my 2 cents to the email problems, one other solution I've seen is to translate email addresses into an image and drop that onto the page. It's not a fantastic solution for those still using Lynx, and you can no longer just click to send mail to somebody, but at least it doesn't go the Javascript route and should be a sufficient technical hurdle to stop automated harvesters for a couple of years at least.
- Anonymous and happy.
Re:Wget is not a spider! (Score:1)
$email_address=~s/\@/ AT
D/\ Gooberguy
Re:Wget is not a spider! (Score:2)
Do you really think spammers haven't figured that out yet?
wget == spambot? (Score:1)
Re:wget == spambot? (Score:1)
I guess he thinks wget is a bot because it can be made to recursively download a whole website, following all anchor tags like a bot even though it is being controlled by a human.
My php solution (Score:2, Informative)
Disallow:
in my robots.txt then in my
ForceType application/x-httpd-php
and in email-addresses:
and chgrp'd
I find mod_rewrite and RewriteEngine more useful (Score:2, Informative)
The syntax is simple,
#Send filesucking programs to hell
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^FlashGet.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline Explorer.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^wget.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar.* [NC]
RewriteRule ^.*$
Seems effective enough for me, and it ain't tough to learn when you can find an example. Of course this does rely on the idea that filesucking programs (or email harvesting bots) identify themselves, but I think naysayers would be surprised at how many of them do just that.
Shaun
*Do* spam bots cruise web sites? (Score:1)
Or does this mean that the spam bots are sufficiently sophisticated that they recognize my trap for what it is? It's meant to be obvious to humans.
WebPoison anyone? (Score:2)
There were some flaws. You'd need a webserver that let you run CGI scripts without necessarily having
Sadly, I've not found any information on it recently. Perhaps someone could hack out a more efficient version of such to address potential problems and bugs.
Re:WebPoison anyone? (Score:2, Insightful)
Re:WebPoison anyone? (Score:2)
All you do is set the domains to a machine on your network that has its SMTP port firewalled. No bandwidth gets lost from the spam and you don't have to worry about the domain being valid.
Re:WebPoison anyone? (Score:1)
http://www.monkeys.com/wpoison/ [monkeys.com].
Re:WebPoison anyone? (Score:2, Insightful)
It's called wpoison, and it's found at http://www.monkeys.com/wpoison/ [monkeys.com]. The problem is that it's very easy to detect -- note the lack of punctuation marks, scarcity of two and three letter words, capital letters, verbs... and the fact that there's a four second pause in the same place, page after page... in short, it would be easy enough to spot a wpoison-generated page.
I've coded up an alternative that suffers none of those obvious defects, and instead of throwing out bogus email addresses, it throws out valid spamcatcher addresses. Any SMTP host who sends a message to one of those addresses is blocked (via DJB's rbldns) for a month from sending mail into my domain. The blocklist is self-maintaining, so I never need to mess with it.
It's been in place for about three months now, and my blocklist contains 125 entries right now -- five of which are netblocks I've manually added. The URL, sure to catch a bucketful of bad spiders thanks to this link, is http://www.artsackett.com/personnel/ [artsackett.com] and it is intentionally as slow as the rectification of sin.
chargen (Score:1)
I had heard of a guy taking chargen or
Re:chargen (Score:1)
This won't work for long. (Score:1)
What does work is building a nice static list of email addresses and names. Link to another page and have it full of the same info. Do this on serveral virtual servers and make sure the web bots can find it.
You can also be nice to the real search engines and tell them not to visit you spam traps and since robots.txt is offten used by the spam bots, telling google not to search that page works out good for both sides of the spider wars.
The next thing is to lock down your mail program once it detects any of the spam traps. There are serveral good ways of doing this based on how you pay for bandwidth. Two of the best options are either play dead with the connection or return a "user mailbox is full". Both of these tie up resources on the spamers end. The other choice is reject 99.99% of the mail and hope they pull your domain out of their lists for being full of junk.
I run @abnormal.com which tends to sort near the top, has lots of bougus addresses and has been running spam traps for years. Everyday I get hit by spamers that have sorted addresses.
One thing to keep in mind is that most bots are run by people only selling lists, not the spamers. Because of that there is no direct link between the searching bots and the mail host that spams latter.
I wonder if its its time to make a RBL like thing that is just for poisoned addresses.
Re:This won't work for long. (Score:1)
You interested? If so, email me by replacing slashdot with the user name asackett in the email address above.
Re:Good for the goose (Score:3, Insightful)
Displaying E-mail Addresses as Graphics (Score:1)
Sample PHP Script [planetsourcecode.com]
Re:Displaying E-mail Addresses as Graphics (Score:1)
Re:Displaying E-mail Addresses as Graphics (Score:1)
Generally, whenever you display an e-mail address you have a mailto: link laying around it. The bot would take the address from there, and it would continue happily ignoring your jpeg's.
Post Gates & SPAM traps - do they work? (Score:1)
Re:Post Gates & SPAM traps - do they work? (Score:1)