Gmail Spam Filter Testing 285
An anonymous reader writes "What can you do with 1000MB of e-mail space on your Gmail account? One guy, by the name of Aaron Pratt ( prattboy@gmail.com ), has decided to test the spam filters of Google's Gmail service by having his Gmail account blasted with every kind of spam imaginable. He is testing to see how well Gmail's spam filters can sort out the spam from legitamate email (yes, he does get personal emails from people). As of May 25th, he was at about 30% of his Gmail account's 1GB capacity. You can track his progress on his website, http://gmail.prattboy.net (Google cache of this site: cache: gmail.prattboy.net). Here is also an article talking about Aaron's efforts from webpronews.com"
One of the best things Google/GMail could do (Score:5, Interesting)
Should be interesting, what filters? (Score:5, Interesting)
Is this the AventureMail guy? (Score:5, Interesting)
Comment removed (Score:2, Interesting)
Re:One of the best things Google/GMail could do (Score:5, Interesting)
The only reason I could think of someone sending those around is to bog up Bayesian filters with random crap, possibly lowering their effectiveness.
Any spammmers/spam-experts feel like enlightening us?
About spam and blocking (Score:5, Interesting)
-A
*just for those who didn't know, the above domain names and email accounts are random, any resemblence to an actual domain or email account is purely coincidental, and if you choose to do so, you should sue
1gb Relieves Spam Concerns (Score:5, Interesting)
_____________________
Seun Osewa, Abeokuta Nigeria [seunosewa.com]
Hmmm.. weird stats... (Score:2, Interesting)
His last week stats are:
Something is off... Unless his spams contain attachments, this says that each of his emails were 17 MB in size each.
I mean 17.73708.. This is /. afterall. :)
How is he compiling stats? (Score:2, Interesting)
I guess because his stats are about 2-3 weeks behind, it would indicate that things are leaning towards the manual procedure...
0% Spam (Score:5, Interesting)
This guy solicited it.
Lack of updates? (Score:5, Interesting)
KevG
It's going to get a lot better... (Score:5, Interesting)
Re:About spam and blocking (Score:2, Interesting)
So this allows you to block some domains, if you'd like.
Re:Spam is always personalized (Score:2, Interesting)
Re:One of the best things Google/GMail could do (Score:4, Interesting)
So either it's some kind of probe to find working addresses, or a filter clogger. Or maybe both.
For a few of the random emails I would later start getting "real" spam. Not a majority though.
Re:whining? (Score:3, Interesting)
Re:whining? (Score:3, Interesting)
Of course, google should improve and filter out the occasional crap I get too. And also offer 1 TB.
Re:One of the best things Google/GMail could do (Score:4, Interesting)
Re:This won't work for me... (Score:2, Interesting)
So yeah, make sure it's OK with yr ISP before signing up, and then you're free to do what you'd like.
CBV
I redirected an old address (Score:3, Interesting)
No, I'm not keeping proper statistics. =b
Re:He gave out his e-mail address... (Score:5, Interesting)
Well, I guess I need a booster shot, so here it is: slashdot@hates.ms. Spam away...
Dumb question about SPAM filters.. (Score:4, Interesting)
1) Several intentionally mis-spelled words
2) Lots of text in white (so it's invisible or nearly invisible)
3) Message in
Could you add filters that look for, say, more than 10% of the words mis-spelled, text font nearly equal to background color, or no actual text in message? These would take effect in addition to the existing Bayes filter.
Re:How to never get spam (Score:2, Interesting)
Aventuremail not as tolerant (Score:4, Interesting)
Yale Story (Score:3, Interesting)
New spin on the "word salad" strategy (Score:5, Interesting)
Right, and my Thunderbird Bayesian filter catches all of those word salad approaches. But they've come up with a new one - what I call the "encyclopedia attack."
What they do is copy an encyclopedia entry and put it at the bottom of their spam. The thing is usually a few paragraphs long, so that textually it dominates the message. The subjects are fairly random, and are occasionally educational ;)
The problem is that the text of this doesn't trip the "too many strange words" flag that's used for word salads. My Thunderbird filter is really having trouble with these. Anyone else having trouble with these spams?
Improvement over time. (Score:2, Interesting)
Image-Based Spam and Checksums (Score:3, Interesting)
What about vetting at least the image-based spam for checksumming? Scan the e-mail for image links (or images included inline). If there's a link, check it against the known list of spam links. If it's in the list, mark the message as spam. Spammers will quickly figure that trick out, though, so step two would be for Google to follow those links, and retrieve the images. Run a checksum of the image file itself; if there are a lot (say, a thousand) messages including the same image, tag it as spam. This combines spam filtering with the fun of reminding spammers that Google has an order of magnitude more bandwidth than they do. Use their own messages against them: the more you spam, the bigger the Slashdotting (Googling? Alas, that word's already taken.)
For bonus points, keep the downloaded images in the Google cache; keeps them available for the mail user, alleviating the load on the sending site for legitimate messages, and keeps them available for, well, the Google cache.
The Solution to Spam (Score:2, Interesting)
This may require an overhaul of the email system though. One may be to have multiple addresses bound together. So you would give one email address and only "authenticated" or approved contacts could get your second address. Now sending an email simultaneously to the two email addresses would result in the email being delivered directly. mail sent to only one of the 2 addresses would be delivered as per normal, and would be subject to the normal filtering. But i guess the spammers would find ways to get both addresses too and defeat that, but it sould be doubly difficult, if not actually an order of magnitude more difficult. How may people get the same spam on different email addresses? This could be useful.
The other is to hit spammers where it hurts, audience. By rolling out a proper ad delivery system (yuck) which was separate from email, if people used their email less for getting information about products, but had it collected by some RSS type system, the spammers would be left with a dwindling audience unless they switched too. The ads would be strictly opt in.
Or mail collection rather than mail delivery. If people collected mail rather than got it delivered to them, they could in theory just not collect spam. why would anyone collect spam?
Lastly is education. if people kept their own whitelists of approved mailers, they could in theory get rid of most spam by keeping good whitelists.