Slashdot Log In
SpamAssassin Gets a Promotion
Posted by
michael
on Sat Jun 26, 2004 01:04 AM
from the now-more-assassinating-power dept.
from the now-more-assassinating-power dept.
darthcamaro writes "The folks at internetnews.com are reporting that the Spam Assassin project has been promoted to a full top level Apache Software Foundation project..the project has been in incubation for a while and it's finally made it through...the article also reveals that Apache is now using Spam Assassin themselves: 'I think spam filtering is now a critical part of the network infrastructure and Spam Assassin is a leader in the area,' said Daniel Quinlan, chairman of the Apache Spam Assassin Project Management Committee."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Nice (Score:2)
Bout Time! (Score:3, Interesting)
Re:Bout Time! (Score:5, Interesting)
Parent
Re:Bout Time! (Score:5, Insightful)
Parent
Re:Bout Time! (Score:4, Insightful)
Parent
Re:Bout Time! (Score:3, Interesting)
I do the exact same thing, but with a score of 12. Anything that trips the filter as spam gets dumped into a spam folder off the main maildir and they can use IMAP or check with webmail to see what spam they have. A cron script erases anything in the spam folder older than 2 weeks. Oh yeah, and individual users can alter their own white/blacklists and scores since I pull the username and match the scores in a postgres database. Combined with clamd and qmail-scanner, it's heaven. :-)
As for the incoming
Re:Bout Time! (Score:5, Informative)
1. HELO Filtering
2. Sender Filtering
3. Recipient Filtering
4. Content Filtering and Delivery
I reject over 95% of all incoming mail before it ever gets to SpamAssassin. This means that SA's success rate isn't as good as on other systems (since I weed out all of the obvious spam), but my mailbox is happy and shiny.
SpamAssassin is a brilliant last line of defense, but I wouldn't advise just dumping your raw incoming stream into it. Much of the useful information about a message isn't available to spamd (such as your list of local domain names, relay domains, etc.) and you should consider using a set of cheaper filters to flush out the blatant chaff.
Parent
erm (Score:4, Informative)
spamassassin.org, not spamassasin.org
Re:erm (Score:4, Interesting)
Parent
Don't worry (Score:5, Funny)
Ben
Parent
Great News! (Score:5, Informative)
Re:Great News! (Score:5, Interesting)
Personally I do use SpamAssassin, but as an intermediate step.
First step: Check a whitelist of known senders. Deliver if the sender is on the list, AND the message originated from an IP subnet that I allow for them personally.
Second step: Scan with SpamAssassin. If the score is really high (above 20) throw it the hell out.
Third step: If the score is less than 20, and the person wasn't whitelisted, run the message through TMDA [tmda.net] and politely tell the sender I'm not sure who they are, and I get a lot of spam, and could you please click this link to prove that you're a real person.
I've been using this three-step system for eighteen months now, and out of over one million messages that have come into my mailbox (really), exactly FOUR spam messages have made it all the way through. Apparently the spammers decided to go ahead and click on the little link, or they used a real person's return address, and when that person got they autoreply, they were too stupid to understand what was going on.
Even better, I have not received ANY indiciation that I've lost any messages; at least, no one has ever mentioned anything about an email that I didn't get.
I've got five other people at my domain using the same system, although for not quite as long (one for fifteen months, three for about a year, and one for just a month now); they have all had similar success.
So based on those numbers I'd estimate a success rate of 99.9997% for eliminating spam (which is, admittedly, COMPLETELY INSANE), and a false-positive (or at least "lost message") rate of 0% so far (fingers crossed). A few people have had to confirm their messages, of course, but I've whitelisted them as that happens.
I actually wrote all the connecting code in PHP, believe it or not, with a MySQL database as a backend. It's invoked using
The whole system took about twelve hours of programming to set up, on one Saturday.
Now, for correspondence to companies (such as Microsoft, or Amazon.com), I use a different scheme (although it's handled by the same PHP code). I create up a unique email address for each of them, which ONLY allows mail to or from that domain (for example "rptamazon@mydomain.com" only allows messages from amazon.com). Those addresses are also easily cancellable, individually, if the company starts to annoy me with spam. Basically, each email address can be assigned its own unique whitelist, and can be cancelled individually at any time, through the little web interface.
I also have a number of email addresses for things such as customer support for our company (I write computer software). I'm using the same system for those, also, but instead of checking whitelists based on the sender, I've found a simple way to do it is to check for ANY of our product names anywhere in the message body or subject. If the message doesn't mention any of them, it sends a simple autoreply back similar to that in (3) above, but mentioning that the message didn't seem to be about any of our products, but if it was, please click here, blah blah. We don't have a high volume of support messages (about one or two a day; we're a small company) but in the last year only three or four people have had to click through like that, and, honestly, their support requests were so f*cked up anyways that I'd rather it just dropped them on the floor.
Then, as a very last ste
Parent
Re:Great News! (Score:3, Informative)
Currently I have amassed 3681 spams totalling 76 megs. I should probably empty that directory sometime
sa-learn makes a big difference though. Helps with the misspellings and random junk. Havn't seen a Nigerian scam come through eith
Re:Great News! (Score:4, Interesting)
What I don't understand is the base64 problem.. One of the first thing SA does is decode base64. Even "rawbody" rules get base64 decoding, so really base64 encoding shouldn't make a difference at all, as SA never examines the encoded text.
As for the intentional mis-spellings of V!agr0, check out antidrug.cf (use google) or wait for SA 3.0 which includes this set of rules as a part of the standard distribution.
Disclaimer: I am the author of antidrug, and thus do have a bias here.
Parent
Re:Great News! (Score:3, Interesting)
Hopefully I'll find some free time later this summer (two big big programming projects I'm working on now are ending next month) and I'll see if I can take a we
Here is the real link to spam assasins site (Score:4, Informative)
The link in the text goes to some search page
3.0? (Score:2)
3.0, late-July, early August (Score:5, Informative)
It will apparently take another month or so to finalize the weighting of the rules.
I've put 3.0.0pre1 on a production system that filters ~350k messages per day. With some tweaking of the RBL, bayes, and AWL rules, it is much (~10%) more efficient at tagging spam than 2.63, which I'm running on a parallel server that also sees ~350k messages/day (load balancing is your friend).
More info: http://www.au.spamassassin.org/full/3.0.x/dist/bu
Parent
Re:3.0? (Score:4, Informative)
[rulesemporium.com]
I use the rules there, and even minor spam gets obliterated with no problems of catching real mail.
I recommend it!
Parent
DSpam (Score:5, Interesting)
DSpam also came with much better directions for integrating with Exim than did SpamAssassin. As fond as I was of SpamAssassin, they have some catching up to do.
Re:DSpam seems okay but not for relay hosts (Score:2)
Maybe there's a way to do it but I couldn't take the time to figure out a good way to get it done.
Re:DSpam (Score:5, Interesting)
It's also not very easy to understand how it works, or configure your mail client to easily train it, or to configure procmail how to properly call it (there are a lot of command-line flags as well).
That being said, IT IS WORTH IT. A properly set up and trained DSPAM filter will SOLVE your spam problem. Training time usually takes about 2 weeks and the results are fantastic after that.
You can also set it up a number of ways - server-side, user-side, with postfix or another mail server, with procmail or without. Relay or not. It's up to you.
Parent
Re:DSpam (Score:3, Interesting)
It's a bugger to set up with Procmail, but if anyone wants a peek at my config file, just e-mail... One thing I did do was forget about that whole "forward
Re:DSpam (Score:3, Informative)
I haven't seen any false positive stats on dspam. It's easy to say a spam filter has a high spam catching rate, but it means nothing without a very low false positive rate.
Redirecting my mail to
Re:DSpam (Score:4, Interesting)
I've got my system set to deliver spam to a spambox which I check nightly for false positives.
and the docs say that I ought to have alot more training before it's up to standard. it's already better for me than SA was.
dave
Parent
Re:DSpam (Score:4, Interesting)
The bad messages go into a quarantine on the server and can be reviewed by the end user using a web-based interface (looking for false positives.) In the press of a button, that quarantine can be emptied, freeing up disk resources on the server.
Other SPAM solutions (like SpamAssassin) mark the message and continue with delivery. What's the point in downloading the SPAM to your mail client just to throw them away?
-ch
Parent
If Only It Was For Real (Score:5, Funny)
The problem is... (Score:4, Funny)
what to do with spam after it's id'd? (Score:5, Interesting)
I like SA, and find it is very good for identifying around 95% of my incoming spam. However, I also have around 0.1% false positive rate, which means at some point I have to look through all the filtered spam messages and make sure none of them were legit.
I need a better tool for handling mail SA has identified as spam, either server-side or client-side. I'd like to delete anything with a score > 15, simply store anything with a score > 5, and send an auto-reply for scores between 5 and 10 indicating that the message was marked as spam and I'll probably never look at it.
A good set of procmail and formail rules will accomplish this, but my hosting company has a weird procmail setup and I'd prefer something easier to implement.
Any ideas?
Re:what to do with spam after it's id'd? (Score:5, Insightful)
I need a better tool for handling mail SA has identified as spam, either server-side or client-side. I'd like to delete anything with a score > 15, simply store anything with a score > 5, and send an auto-reply for scores between 5 and 10 indicating that the message was marked as spam and I'll probably never look at it.
Procmail can do it, but please reconsider the auto-replies. What happens if I'm pissed at bob and decide to sent out 1m spams with the return address of bob@example.com? More common, what about viruses that forge headers?
I would consider auto-whitelisting instead.
Parent
a better approach: reject the mail (Score:3, Informative)
Re:what to do with spam after it's id'd? (Score:3, Interesting)
sorting mail by spamassassin score (Score:5, Informative)
I can't speak for auto-replies, but you can do the sorting part client-side. The key is that spamassassin adds a line like "X-Spam-Level: *****" where the number of *'s is the score of the email. Almost any email client can filter mail to different folders based on headers. The unary representation of the spam score ensures that even a primitive filter can work.
For example, one popular client is Microsoft Outlook, and there are several web pages in google (such as this one [carleton.ca]) that explain how to reroute mail to specific folders depending on the spamassassin score.
Parent
Re:what to do with spam after it's id'd? (Score:4, Interesting)
Yes, you sure do.
Odds are that this doesn't apply to you, but the Mac OS X mail program, Mail, does a brilliant job. It recognizes the YES or NO header that SpamAssassin adds to filtered messages and, depending on your preferences, filters accordingly. By default it merely flags spam messages with a little trash-bag icon and leaves them in your inbox. At the flip of a switch, you can have the program automatically move spams into a Junk folder that (again, depending on your prefs) can be automatically emptied every week or month or day or whatever.
If your mail program doesn't already do this, then your mail program sucks.
Parent
Re:what to do with spam after it's id'd? (Score:5, Informative)
I have a very well known address (which is why I'm posting as an Anonymous Coward
The correct response to spam is to throw it away. Trying to reply to it makes the world worse, not better.
Parent
Re:what to do with spam after it's id'd? (Score:4, Informative)
logfile "/path/to/my/home/dir/maildrop.log"
###
### Maildrop variable substitution
###
MAILBOX="./Maildir"
DEFAULT= "$MAILBOX"
SPAM="$MAILBOX/.Spam"
###
### SpamAssassin
###
# Filter through SpamAssassin
xfilter "/usr/local/bin/spamc"
# Handle messages marked as spam
if (
{
# Store messages flagged as spam in another folder; uncomment
# this during testing just in case any legit mail gets sent
# to
#cc "./spam-store"
# Delete messages with a score of 10 or higher, filter all other
# spam messages into a spam folder
if ( $MATCH2 >= 10.0 )
to "/dev/null"
else
to $SPAM
}
Parent
What's the big deal? (Score:5, Funny)
You people need to stop being so cynical (Score:5, Funny)
A step up from living in your parent's basement and whacking off to an inflatable doll, right?
I'd stay and chat, but I have to get back to a Nigerian man about a bank transfer
Re:You people need to stop being so cynical (Score:4, Funny)
Parent
Get the owner, not the dog..... (Score:5, Insightful)
Somebody is paying for the spamming, and we know exactly who it is. The URL of that organization is prominently displayed in every item of spamail. It is the advertiser.
The advertiser is right there out in the open, easy to locate. If they're not, the spam isn't doing its job, and wouldn't have been sent. And easy to locate means easy to go after, easy to sue, to fine, DoS or whatever.
Dinging the advertisers, and dinging them hard, will instantly put the spammers out of business.
Spamming can be eliminated without blocking, white lists, or anti-spoofing RFC's. Just go to where it's pointing.
To draw an [ugly, graphic] picture: a dog comes and poops on sidewalk in front of my house, and I step in it. Yelling at the dog is going to be only moderately successful, building a poop filter is difficult, messy, and leaky (as Spam Assassin demonstrates) . Following the dog's leash and fining the owner is what works.
The owner doesn't bring the dog back since s/he doesn't want to pay another fine.
No owner, no dog, no spam.
Get the owner.
Kill the spam.
Re:Get the owner, not the dog..... (Score:3, Insightful)
Let the FBI actually buy something from a spammer, trace the money, as its being bought with a CC, then prosecute whoever cashes the CC transaction. They do buys for drug busts routinely, so why not.
throws away ANY bulk mail (Score:5, Interesting)
that plus the points for any non-safe html colors or any html at all, SA effectively tags ANY bulk mail as spam!
For an end user to setup on their client (as a "junk mail" folder) thats great.. I like to have bulk mail seperated from my personal mail, but for an ISP to throw it away before it even gets to the intended recipient is fucking rediculous and should be illegal.
The only email an ISP should be allowed to discard are the ones with attached viruses or some known email worm. The only reason your customers are happy with you throwing away their email is because you don't fucking tell them.
Re:throws away ANY bulk mail (Score:3, Insightful)
Thank Microsoft. ISPs could easily just add a header line and let the user filter on it, but Outlook Express is crippled from Outlook in that it can't match on arbitrary header lines, forcing ISPs to delete or leave alone.
I agree that SA is great client-side, which is how I use it. The problem is that it isn't plug-and-play on even *IX, and it's not trivial [openhandhome.com] to set up on the client s
You can even run spamassassin directly on Exchange (Score:4, Informative)
But if you are a smaller shop and don't have the resources for that, then you can run sa right on Exchange.
Here is a write up on how to do it [spamblogging.com] (that particular write up is for Exchange 2003 and SA 3.0, but it will work for SA 2.x as well, and for Exchange 2000 - or any combination thereof - but it won't work on Exchange 5.5 that I know of).
Re:Spam... I just don't get it. (Score:3, Funny)
Publish your addy on /. (or anywhere else), wait a few days, and have fun!
Re:I prefer my method - sacrificial subdomains (Score:3, Interesting)
As a side note, I don't use these email addresses for personal emails - I can hopefully trust that the people I personally send emails to are not, or are not going to become spammers.
Well, that is not a very secure assumption. Unless you know that all those people are not using an MUA/OS combination that is vulnerable to viruses or worms. Harvesting addresses is done that way nowadays...
Re:Challenge-Response schemes are more effective (Score:5, Informative)
Parent
Re:Challenge-Response schemes are more effective (Score:3, Insightful)
Re:Challenge-Response schemes are more effective (Score:3, Insightful)
We shouldn't feed the trolls (eh. ACs), but I'll bite anyway, because it's a valid argument.
You also ban all innocent bystanders than send you regular 550: no such user bounces, right? TMDA messages are exactly like bounces if you think of it. They appear automatically generated on purpose. It's a piece of cake to filter them if you dislike 'em. It's not like spam which tries to deceive you.
Now, trying not to be too caustic, backscatter is a fact of life. If you really want to avoid this completely, you
Re:I'll never need Spam Assassin (Score:3, Insightful)
You're just plain lucky. It's a fact of life that at least one of your email pals will use Windows, and store your emails in an Outlook or Outlook Express mail folder. Some days later, your pal will catch a worm or virus, and this little spam helper will harvest all those addresses, including your beloved, "protected" addy.