Gmail Spam Filter Testing 285
An anonymous reader writes "What can you do with 1000MB of e-mail space on your Gmail account? One guy, by the name of Aaron Pratt ( prattboy@gmail.com ), has decided to test the spam filters of Google's Gmail service by having his Gmail account blasted with every kind of spam imaginable. He is testing to see how well Gmail's spam filters can sort out the spam from legitamate email (yes, he does get personal emails from people). As of May 25th, he was at about 30% of his Gmail account's 1GB capacity. You can track his progress on his website, http://gmail.prattboy.net (Google cache of this site: cache: gmail.prattboy.net). Here is also an article talking about Aaron's efforts from webpronews.com"
Not that impressive (Score:5, Informative)
I am starting to second guess whether I should transfer everything to my Gmail account.
Re:One of the best things Google/GMail could do (Score:5, Informative)
Re:Not that impressive (Score:5, Informative)
(example, after two weeks of using spam-assassin, it decided that every e-mail sent to me was spam.. i no longer received anything in my Inbox, everything was transferred to the Spambox. It took me another two weeks tweaking spam-assassin's kill rate down to about a 50% accuracy, and now i actually receive all my emails.)
My own gmail testing (Score:5, Informative)
It was from " Mr Jubril Udeh Manager of Credit and Accounts Department of North Atlantic Securities Sarls Lome-Togo Republic."
Now, the funny part is not that the mail made it through, but that google also decided to show me contextual ad's on that account. Currently, the ads are:
- Payroll Cards a Poor Substitute for Checking Account
- Tips for Tackling Check Fraud
- Sophos hoax description: Ethiopian airline letter
- FAP non-US Investment FAQs
In the past the mail has also shown me ads on how to open an off-shore bank account. I'm glad google is willing to help me with the $10.5 million dollars that I'm about to receive!
Spam is always personalized (Score:5, Informative)
This doesn't mean it wouldn't be possible to create a system which would automatically detect individual spam messages based on tagging known spam, you just have to be smarter about the detection than just plain MD5ing the email body.
Re:Not that impressive (Score:2, Informative)
Mine is kredal@gmail.com, if you're interested. (:
Re:The Filter is great! (Score:5, Informative)
Re:One of the best things Google/GMail could do (Score:2, Informative)
I get those in Eudora and they don't seem to do much, my friends with Outlook however... not so lucky.
Re:Hmmm.. weird stats... (Score:5, Informative)
3778 messages / 213 MB = 17.37 messages / MB
213 MB / 3778 messages = 0.0564 MB / message
So that's pretty reasonable.
Re:One of the best things Google/GMail could do (Score:3, Informative)
Re:One of the best things Google/GMail could do (Score:5, Informative)
Re:whining? (Score:4, Informative)
Re:One of the best things Google/GMail could do (Score:3, Informative)
didn't somebody already sort of attempt this? (Score:2, Informative)
Eh, I only got 180MB worth of email and spam out of the deal though, before I decided to delete the account. The Gmail Spam filter was rather horrible at the time; catching only the most tried and true SPAM, letting tons of other SPAM through, and then randomly flagging legitimate messages from people whom it had not flagged before. I think it has improved some since then.
Re:whining? (Score:5, Informative)
Why?
Google uses commodity IDE drives. Those retail for about fifty cents a gigabyte. Google's not paying retail.
I read a quote from a Googleperson that by the time the drive is installed in a system, powered, cooled, backed up and administered Google is paying two dollars for a gigabyte.
Good point about the problem of abandoned accounts, which won't bring Google any ad revenue. Wouldn't be surprised if they start euthanizing inactive accounts.
Re:One of the best things Google/GMail could do (Score:3, Informative)
>no it's not.
It doesn't matter how cheap it is when 80% of spam supposedly comes from infected zombie computers. (I'm too lazy to actually LINK to the recent story on this.)
Re:Not that impressive (Score:3, Informative)
Rubbish - I've used thunderbird for many months now, with an account that gets quite a bit of spam. I have yet to see thunderbird make a wrong guess at whats spam and whats not. If anything, thunderbird is more likely to go the other way - allowing spam through - than deleting real email.
Re:Spam is always personalized (Score:5, Informative)
Not necessarily.
Lempel-Ziv based algorithms, like the one used by gzip, build a compression dictionary on the fly. Any "personalization" added to the message will affect the dictionary to varying degrees from then onward. If it's near the beginning, the personalization would greatly skew the selected dictionary identifiers. Though probably this would have little effect on the actual compression of the data, it would radically change the representation of the compressed image. The farther this personalization is from the start of the data to be compressed, the less effect it will have.
Re:More focus on false positives. (Score:4, Informative)
A false positive is not one of spam getting past the filter, it's one of non-spam getting blocked.
I.e. the filter says it's spam, and it isn't - in the same way that a false-positive medical test says you have a virus even when you don't.
Re:Not a fair test (Score:4, Informative)
If you don't have a PTR record associated with your host, try to send mail to them, or malform your EHLO or something else.
You don't need to be "really sure" mail is spam- I'm talking about doing things like standards complaince checking, which will result in mail being rejected at delivery time.
Is this just random theorizing, or does GMail really fail to deliver some emails it thinks is spam?
There's no reason to get insulting. RFC 2821 has a number of requirements for delivery of mail that many services ignore.
Re:More focus on false positives. (Score:3, Informative)
False negative = condition you are testing for comes up negative, when it should be positive.
Put in the context of a spam filter, it depends on whether you are testing for spam or for legitimate emails. If you are testing for spam (if spam then...), a false positive would be an email that is not spam getting sent to the spam folder or deleted. A false negative would be spam that lands in your inbox.
Re:Cache? (Score:2, Informative)