Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Google Businesses The Internet Spam

Gmail Spam Filter Testing 285

An anonymous reader writes "What can you do with 1000MB of e-mail space on your Gmail account? One guy, by the name of Aaron Pratt ( prattboy@gmail.com ), has decided to test the spam filters of Google's Gmail service by having his Gmail account blasted with every kind of spam imaginable. He is testing to see how well Gmail's spam filters can sort out the spam from legitamate email (yes, he does get personal emails from people). As of May 25th, he was at about 30% of his Gmail account's 1GB capacity. You can track his progress on his website, http://gmail.prattboy.net (Google cache of this site: cache: gmail.prattboy.net). Here is also an article talking about Aaron's efforts from webpronews.com"
This discussion has been archived. No new comments can be posted.

Gmail Spam Filter Testing

Comments Filter:
  • by Anonymous Coward on Monday June 14, 2004 @10:37AM (#9419781)
    Is use the GMail data to operate a checksum blacklist. Obviously, if thousands (or millions) of their users are getting the exact same email, it's probably spam.
  • by Clinoti ( 696723 ) * on Monday June 14, 2004 @10:41AM (#9419816)
    Can anyone provide a link or source to the kind of filters google has working on gmail?
  • by magefile ( 776388 ) on Monday June 14, 2004 @10:45AM (#9419861)
    The guy who got booted off AventureMail (2GB free) for trying to test their spam filters? The story is on Kuro5hin [kuro5hin.org], if anyone wants to see it.
  • Comment removed (Score:2, Interesting)

    by account_deleted ( 4530225 ) on Monday June 14, 2004 @10:50AM (#9419911)
    Comment removed based on user account deletion
  • by Cruciform ( 42896 ) on Monday June 14, 2004 @10:52AM (#9419928) Homepage
    I've been getting them as well.
    The only reason I could think of someone sending those around is to bog up Bayesian filters with random crap, possibly lowering their effectiveness.

    Any spammmers/spam-experts feel like enlightening us? :)
  • by AviLazar ( 741826 ) on Monday June 14, 2004 @10:53AM (#9419941) Journal
    While we cannot block every domain name (i.e. if you get spam from $#(*$#sexphreak@yahoo.com) because it will alienate your legitimate contacts, there are many domain names that we can block (i.e. @spam-your-gmail.com). Yahoo provides email/domain name blocking, but limits this to 100 (unless you are paying). Do we know if gmail will have this limitation?
    -A
    *just for those who didn't know, the above domain names and email accounts are random, any resemblence to an actual domain or email account is purely coincidental, and if you choose to do so, you should sue /., not me :)
  • by osewa77 ( 603622 ) <naijasms@gma[ ]com ['il.' in gap]> on Monday June 14, 2004 @10:54AM (#9419945) Homepage
    I have subjected my e-mail address, afriguru@gmail.com [mailto] to the same abuse. by redirecting all e-mail addresses that recieve lots of junk mail to this one and posting the address unprotected to lots of websites and newsgroups. At the initial stage, a lot of 419 scam mails got through, but now I hardly get any spam. No false positives for me so far.
    _____________________
    Seun Osewa, Abeokuta Nigeria [seunosewa.com]
  • by Mz6 ( 741941 ) * on Monday June 14, 2004 @10:56AM (#9419958) Journal
    Hmm.. well let's see...

    His last week stats are:

    3778 messages were received, totaling 213 MB. 3917 were spam, and Gmail correctly identified 41.9% of these messages.

    Something is off... Unless his spams contain attachments, this says that each of his emails were 17 MB in size each.

    I mean 17.73708.. This is /. afterall. :)

  • by HellKnite ( 266374 ) on Monday June 14, 2004 @10:59AM (#9419991)
    Anyone know how he's pulling the numbers off the page? Is there some kind of sneaky back-end that we can get stats about our account with? Is he manually entering all this info? Or maybe some kind of "screen-scraping" techniques to pull the data off the page... hmm...

    I guess because his stats are about 2-3 weeks behind, it would indicate that things are leaning towards the manual procedure...
  • 0% Spam (Score:5, Interesting)

    by yuri ( 22724 ) on Monday June 14, 2004 @11:00AM (#9419995)
    Spam is unsolicited, so google should filter none of his mail.

    This guy solicited it.
  • Lack of updates? (Score:5, Interesting)

    by Xiadix ( 159305 ) on Monday June 14, 2004 @11:07AM (#9420057) Homepage Journal
    Did anybody else notice that his site hasn't been updated in almost a month (May 25)? Seems his project is no longer working. I wonder if Google booted him.

    KevG
  • by waytoomuchcoffee ( 263275 ) on Monday June 14, 2004 @11:11AM (#9420087)
    For those of you that don't have Gmail yet, there is a little "Report Spam" button you can use to, well, report spam. When Gmail gets a few million users, and even 1% use this little button, you are going to see the spam detect rate skyrocket.
  • by OiPolloi ( 638427 ) <sena@smux.net> on Monday June 14, 2004 @11:13AM (#9420113) Homepage
    Gmail allows you to create any number (at least they don't seem to have any limit) of "filters". These are rules that allow you to manage your messages based on sender, recipient, subject, if the message has an attachment, if it has certain words, etc.

    So this allows you to block some domains, if you'd like.
  • gzip it and compare the files. a short tracking code will make a negligible difference.
  • by Xzzy ( 111297 ) <sether@@@tru7h...org> on Monday June 14, 2004 @11:18AM (#9420158) Homepage
    My server was set up to forward anything sent to one of my domains to get dumped into a common inbox. I noticed a ways back (before I changed my config to just bounce all this crap) that I'd get a lot of those dictionary emails to random email accounts.

    So either it's some kind of probe to find working addresses, or a filter clogger. Or maybe both.

    For a few of the random emails I would later start getting "real" spam. Not a majority though.
  • Re:whining? (Score:3, Interesting)

    by thogard ( 43403 ) on Monday June 14, 2004 @11:23AM (#9420222) Homepage
    If I offer 10 of my most leeaching customers 1 gig of space, I will need 10 gig of space... or will I? How much of that will be duplicated between at least two users and how much of it will be used by all 10? Remember Google already has copies off allmost all the useful stuff on the net. If you grab some random web page and mime attach it to email, thats going to waste space in my mailbox but if google can figure out that they already have all the images, as well as the text, its going to compress down to very little. For the 1st customers it requires a massive increase in needed disk space but at some point it starts dropping off. Sort of like how much stuff they have to index for the web and image searches.
  • Re:whining? (Score:3, Interesting)

    by Valluvan ( 564515 ) on Monday June 14, 2004 @11:29AM (#9420273) Homepage Journal
    Not many are as gregarious as Pratt. I've been using gmail for some time now. I must say google has done a pretty good job with their spam filters. For not-high-volume users (which most people are), gmail works much better than other email providers (i have yahoo, ureach and hotmail accounts which I use regularly).

    Of course, google should improve and filter out the occasional crap I get too. And also offer 1 TB.
  • by dragonman97 ( 185927 ) * on Monday June 14, 2004 @11:47AM (#9420447)
    Indeed - while I was doing a lot of spam fighting at work, I reviewed a honeypot I'd set up, and was amazed. I used mutt to review the messages, and found a couple of messages where the text part was a page or two from "The Wizard of Oz" and the nasty offer for some kind of auto insurance or other crap was in the HTML section, replete with hidden hash busters behind color backgrounds. These guys are sharp - they must be paying some smart programmers a lot of money, and it's only sad that they've sunk to such levels.
  • by Chuck Bucket ( 142633 ) on Monday June 14, 2004 @11:52AM (#9420486) Homepage Journal
    When signing up for my DSL I made sure all servers were OK to run. Once I had that I setup my mailserver, learned how to admin it, learned how to run Spamassasin, gave accounts to friends/family and now I have about 10 users that hit it everyday.

    So yeah, make sure it's OK with yr ISP before signing up, and then you're free to do what you'd like.

    CBV
  • by FooAtWFU ( 699187 ) on Monday June 14, 2004 @12:03PM (#9420596) Homepage
    I redirected an old manager@(two letters here).net site so Gmail gets a carbon copy of all the spam sent there (it's lots, trust me). At first it seemed that my Thunderbird Bayseian filters were doing better, but the trend seems to have reversed lately.

    No, I'm not keeping proper statistics. =b

  • by Algan ( 20532 ) on Monday June 14, 2004 @12:31PM (#9420884)
    It's not that bad as you think. I posted an dedicated email address to slashdot two times already, just to see what volume of spam I get. Surprisingly, it's only 2-3 messages every other day or so.

    Well, I guess I need a booster shot, so here it is: slashdot@hates.ms. Spam away...
  • by StressGuy ( 472374 ) on Monday June 14, 2004 @12:32PM (#9420889)
    I have Mozilla, it has a Bayes SPAM filter. Lately, it's been getting fooled more and more. The messages that make it through have one or more of the following features:

    1) Several intentionally mis-spelled words

    2) Lots of text in white (so it's invisible or nearly invisible)

    3) Message in .GIF form only - no plain text.

    Could you add filters that look for, say, more than 10% of the words mis-spelled, text font nearly equal to background color, or no actual text in message? These would take effect in addition to the existing Bayes filter.

  • by Frobisher ( 677079 ) on Monday June 14, 2004 @12:52PM (#9421090) Homepage
    Which is why my email address is llllllllooong. 20 characters before the @ sign. I don't post it anywhere and I think 20 chars is outside the range of such brute force methods. I've been spam free for about 2 years. And I mean spam FREE. I get NOTHING to my junk mail folder. Its marvellous!
  • by dirvish ( 574948 ) <dirvish@ f o undnews.com> on Monday June 14, 2004 @01:42PM (#9421695) Homepage Journal
    I tried to do the same thing with my AventureMail [aventuremail.com] account but AventureMail wasn't cool with it. They deleted my account! You can check out what little data I collected before the account suspension and read the emails to and from AventureMail about the merits of the account suspension at http://3fingersalute.net/aventuremail [3fingersalute.net]
  • Yale Story (Score:3, Interesting)

    by dirvish ( 574948 ) <dirvish@ f o undnews.com> on Monday June 14, 2004 @02:35PM (#9422283) Homepage Journal
    Here is a discussion [yale.edu] from Yale's LawMeme on the legal ramifications of Prattboy's experiment. Does asking others to sign you up for spam count as an opt-in?
  • by Scott Richter ( 776062 ) on Monday June 14, 2004 @03:03PM (#9422594)
    Except that won't work, as anyone that understands Bayesian filtering will tell you. In the case of every message with "random words" I've checked recently, the random words actually increased the spam score of that message. Why? Because it seems the random words aren't so random and either the same spammer is using the same "random words" over and over or various spammers are using sets of the same words. Over time most of the "random words" they use actually become great indicators of spam since my real email doesn't typically contain the random words they use.

    Right, and my Thunderbird Bayesian filter catches all of those word salad approaches. But they've come up with a new one - what I call the "encyclopedia attack."

    What they do is copy an encyclopedia entry and put it at the bottom of their spam. The thing is usually a few paragraphs long, so that textually it dominates the message. The subjects are fairly random, and are occasionally educational ;)

    The problem is that the text of this doesn't trip the "too many strange words" flag that's used for word salads. My Thunderbird filter is really having trouble with these. Anyone else having trouble with these spams?

  • by Jett ( 135113 ) on Monday June 14, 2004 @03:48PM (#9422989)
    I've had a gmail account for almost 3 months now. In the first month I got 3 spam messages, they all made it thru the filter. Since then I've gotten 5 more, only 1 of which made it thru. It's not statistically significant yet, but to me it feels like the filter has improved. I'm already up to 5% of my 1gig too...
  • by BarefootClown ( 267581 ) on Monday June 14, 2004 @04:29PM (#9423339) Homepage

    What about vetting at least the image-based spam for checksumming? Scan the e-mail for image links (or images included inline). If there's a link, check it against the known list of spam links. If it's in the list, mark the message as spam. Spammers will quickly figure that trick out, though, so step two would be for Google to follow those links, and retrieve the images. Run a checksum of the image file itself; if there are a lot (say, a thousand) messages including the same image, tag it as spam. This combines spam filtering with the fun of reminding spammers that Google has an order of magnitude more bandwidth than they do. Use their own messages against them: the more you spam, the bigger the Slashdotting (Googling? Alas, that word's already taken.)

    For bonus points, keep the downloaded images in the Google cache; keeps them available for the mail user, alleviating the load on the sending site for legitimate messages, and keeps them available for, well, the Google cache.

  • The Solution to Spam (Score:2, Interesting)

    by vakuona ( 788200 ) on Monday June 14, 2004 @07:39PM (#9425007)
    I have 3 ideas that may overcome spam.

    This may require an overhaul of the email system though. One may be to have multiple addresses bound together. So you would give one email address and only "authenticated" or approved contacts could get your second address. Now sending an email simultaneously to the two email addresses would result in the email being delivered directly. mail sent to only one of the 2 addresses would be delivered as per normal, and would be subject to the normal filtering. But i guess the spammers would find ways to get both addresses too and defeat that, but it sould be doubly difficult, if not actually an order of magnitude more difficult. How may people get the same spam on different email addresses? This could be useful.

    The other is to hit spammers where it hurts, audience. By rolling out a proper ad delivery system (yuck) which was separate from email, if people used their email less for getting information about products, but had it collected by some RSS type system, the spammers would be left with a dwindling audience unless they switched too. The ads would be strictly opt in.

    Or mail collection rather than mail delivery. If people collected mail rather than got it delivered to them, they could in theory just not collect spam. why would anyone collect spam?

    Lastly is education. if people kept their own whitelists of approved mailers, they could in theory get rid of most spam by keeping good whitelists.

The moon is made of green cheese. -- John Heywood

Working...