Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Gmail Spam Filter Testing 285

Posted by Hemos on Monday June 14, 2004 @10:35AM from the send-the-mail-in dept.

An anonymous reader writes "What can you do with 1000MB of e-mail space on your Gmail account? One guy, by the name of Aaron Pratt ( prattboy@gmail.com ), has decided to test the spam filters of Google's Gmail service by having his Gmail account blasted with every kind of spam imaginable. He is testing to see how well Gmail's spam filters can sort out the spam from legitamate email (yes, he does get personal emails from people). As of May 25th, he was at about 30% of his Gmail account's 1GB capacity. You can track his progress on his website, http://gmail.prattboy.net (Google cache of this site: cache: gmail.prattboy.net). Here is also an article talking about Aaron's efforts from webpronews.com"

This discussion has been archived. No new comments can be posted.

Gmail Spam Filter Testing

Load All Comments

Search 285 Comments Log In/Create an Account

Comments Filter:

first spam? (Score:5, Funny)

by miketang16 ( 585602 ) * writes: on Monday June 14, 2004 @10:36AM (#9419761) Journal

psh.. i've done this to my friends before.. they didn't need to make a website to ask for it...

Share
twitter facebook
- Re:first spam? (Score:4, Funny)
  
  by Anonymous Coward writes: on Monday June 14, 2004 @11:14AM (#9420126)
  
  Oh, I didn't know that was you who passed my address along so I could b uy che.ap v1agra! Thanks! Those pi1ls made my p.e..ni.s gr0w 3-5 lnches! It was really very thoughtful of you, Mike.
  
  Parent Share
  twitter facebook
- More focus on false positives. (Score:5, Insightful)
  
  by ron_ivi ( 607351 ) writes: <sdotno&cheapcomplexdevices,com> on Monday June 14, 2004 @12:14PM (#9420701)
  
  Reviews of spam filters always seem to focus on how much stuff they block.
  The consequenses of blocking a non-spam email are so much worse (parent not hearing from kid. the customer that would have saved your startup.) than a spam getting in, I wish the spam filter reviews would focus on those.
  
  Parent Share
  twitter facebook
  - - Re:More focus on false positives. (Score:4, Informative)
      
      by Anonymous Coward writes: on Monday June 14, 2004 @01:49PM (#9421773)
      
      false positive : spam getting past the filter ratio...
      A false positive is not one of spam getting past the filter, it's one of non-spam getting blocked.
      I.e. the filter says it's spam, and it isn't - in the same way that a false-positive medical test says you have a virus even when you don't.
      
      Parent Share
      twitter facebook
      - Re:More focus on false positives. (Score:3, Informative)
        
        by einTier ( 33752 ) * writes:
        
        False positive = condition you are testing for comes up positive, when it should be negative.
        False negative = condition you are testing for comes up negative, when it should be positive.
        Put in the context of a spam filter, it depends on whether you are testing for spam or for legitimate emails. If you are testing for spam (if spam then...), a false positive would be an email that is not spam getting sent to the spam folder or deleted. A false negative would be spam that lands in your inbox.
The Filter is great! (Score:5, Funny)

by umrgregg ( 192838 ) writes: on Monday June 14, 2004 @10:36AM (#9419770) Homepage

Apparently, Google's spam filter even filters messages that aren't there. From the website:
3778 messages were received, totaling 213 MB.

3917 were spam, and Gmail correctly identified 41.9% of these messages.
Fantastic

Share
twitter facebook
- pre-emptive strike theory (Score:2)
  
  by muyuubyou ( 621373 ) writes:
  
  cometh to Google
  - Re:pre-emptive strike theory (Score:5, Funny)
    
    by umrgregg ( 192838 ) writes: on Monday June 14, 2004 @10:56AM (#9419960) Homepage
    
    Right! My only idea is that Google's technology is so advanced, it filters messages before they are even sent. It's gotta be a result of faster-than-light calculations. Boy, I'm gonna buy me some stock.
    
    Parent Share
    twitter facebook
- Re:The Filter is great! (Score:5, Funny)
  
  by Anonymous Coward writes: on Monday June 14, 2004 @10:59AM (#9419992)
  
  No, thats just a classic threaded code bug:
  
  They just forgot the mutex surrounding the two snprintfs... so this user probably got 139 messages in the time it takes to execute snprintf, all spam.
  
  Which is.... about right.
  
  Parent Share
  twitter facebook
- Re:The Filter is great! (Score:5, Informative)
  
  by aismail3 ( 735831 ) writes: on Monday June 14, 2004 @11:02AM (#9420018)
  
  When I add up the figures from May 13 to 19, I get that 4869 messages were received. 4717 of those were spam, and 1820 were marked, so Gmail's success rate was 38.6%.
  
  Parent Share
  twitter facebook
One of the best things Google/GMail could do (Score:5, Interesting)

by Anonymous Coward writes: on Monday June 14, 2004 @10:37AM (#9419781)

Is use the GMail data to operate a checksum blacklist. Obviously, if thousands (or millions) of their users are getting the exact same email, it's probably spam.

Share
twitter facebook
- Re:One of the best things Google/GMail could do (Score:5, Informative)
  
  by kryptkpr ( 180196 ) writes: on Monday June 14, 2004 @10:45AM (#9419854) Homepage
  
  Spammers have thought of this already, and they send nearly-identical messages.. Ever notice the random strings of letters and/or numbers at the bottom/in the subjects of spams?
  
  Parent Share
  twitter facebook
  - Re:One of the best things Google/GMail could do (Score:5, Funny)
    
    by lockefire ( 691775 ) writes: on Monday June 14, 2004 @10:46AM (#9419871)
    
    Actually, I get a whole lot of emails with the random words and nothing else. I haven't quite caught on to the advertising strategy in that.
    
    Parent Share
    twitter facebook
    - Re:One of the best things Google/GMail could do (Score:5, Interesting)
      
      by Cruciform ( 42896 ) writes: on Monday June 14, 2004 @10:52AM (#9419928) Homepage
      
      I've been getting them as well.
      The only reason I could think of someone sending those around is to bog up Bayesian filters with random crap, possibly lowering their effectiveness.
      
      Any spammmers/spam-experts feel like enlightening us? :)
      
      Parent Share
      twitter facebook
      - Re:One of the best things Google/GMail could do (Score:4, Interesting)
        
        by Xzzy ( 111297 ) writes: <sether@tru7[ ]rg ['h.o' in gap]> on Monday June 14, 2004 @11:18AM (#9420158) Homepage
        
        My server was set up to forward anything sent to one of my domains to get dumped into a common inbox. I noticed a ways back (before I changed my config to just bounce all this crap) that I'd get a lot of those dictionary emails to random email accounts.
        
        So either it's some kind of probe to find working addresses, or a filter clogger. Or maybe both.
        
        For a few of the random emails I would later start getting "real" spam. Not a majority though.
        
        Parent Share
        twitter facebook
      - Re:One of the best things Google/GMail could do (Score:5, Informative)
        
        by Halo1 ( 136547 ) writes: on Monday June 14, 2004 @11:21AM (#9420195)
        
        Most of the time, these messages contain both a text/plain section with only random words, and then a text/html part with the real payload. If you use mutt or so, you most likely only see the text/plain stuff. Another trick is using just a text/html section with random text, but also with an image that contains the real payload.
        
        Parent Share
        twitter facebook
        
        Re:One of the best things Google/GMail could do (Score:4, Interesting)
        
        by dragonman97 ( 185927 ) * writes: on Monday June 14, 2004 @11:47AM (#9420447)
        
        Indeed - while I was doing a lot of spam fighting at work, I reviewed a honeypot I'd set up, and was amazed. I used mutt to review the messages, and found a couple of messages where the text part was a page or two from "The Wizard of Oz" and the nasty offer for some kind of auto insurance or other crap was in the HTML section, replete with hidden hash busters behind color backgrounds. These guys are sharp - they must be paying some smart programmers a lot of money, and it's only sad that they've sunk to such levels.
        
        Parent Share
        twitter facebook
      - Re:One of the best things Google/GMail could do (Score:5, Insightful)
        
        by letxa2000 ( 215841 ) writes: on Monday June 14, 2004 @11:56AM (#9420531)
        
        Spammer is trying to do two things: 1. break any Bayesian filter used on that mail server/inbox. Adding noise to the filter will allow more mail through as "questionable". This might still be tagged as spam, but not as readily as it would be without the added noise
        Except that won't work, as anyone that understands Bayesian filtering will tell you. In the case of every message with "random words" I've checked recently, the random words actually increased the spam score of that message. Why? Because it seems the random words aren't so random and either the same spammer is using the same "random words" over and over or various spammers are using sets of the same words. Over time most of the "random words" they use actually become great indicators of spam since my real email doesn't typically contain the random words they use.
        In one recent analysis, 10 random words were inserted by the spammer. He got lucky and 1 of those words actually had a very low score for my Bayesian corpus. Unfortunately (for him), the other 9 words had scores of 99.99%! His use of random words literally nuked any possibility of him getting through my filter.
        Anyway, random words will not help spammers get through Bayesian filters. But it seems that many people (both spammers and non-spammers) think it will. But, hey, that's good for me: as long as "random words" is seen by spammers as a viable solution to Bayesian filters, my Bayesian filter will continue to work and will not have to deal with any innovative way to get around the filter (if any exists).
        
        Parent Share
        twitter facebook
        
        Re:One of the best things Google/GMail could do (Score:4, Insightful)
        
        by wickidpisa ( 41827 ) writes: on Monday June 14, 2004 @12:52PM (#9421089) Homepage
        
        It may not increase false negatives, but it has decent chances of increasing false positives which is a much greater problem. My best guess is that spammers are hoping that once enough random words are classified as spam words, real emails with those words will start being classified as spam. If they can force enough false positives, people will start turning off bayesian filtering.
        
        Parent Share
        twitter facebook
        
        New spin on the "word salad" strategy (Score:5, Interesting)
        
        by Scott Richter ( 776062 ) writes: on Monday June 14, 2004 @03:03PM (#9422594)
        
        Except that won't work, as anyone that understands Bayesian filtering will tell you. In the case of every message with "random words" I've checked recently, the random words actually increased the spam score of that message. Why? Because it seems the random words aren't so random and either the same spammer is using the same "random words" over and over or various spammers are using sets of the same words. Over time most of the "random words" they use actually become great indicators of spam since my real email doesn't typically contain the random words they use.
        Right, and my Thunderbird Bayesian filter catches all of those word salad approaches. But they've come up with a new one - what I call the "encyclopedia attack."
        What they do is copy an encyclopedia entry and put it at the bottom of their spam. The thing is usually a few paragraphs long, so that textually it dominates the message. The subjects are fairly random, and are occasionally educational ;)
        The problem is that the text of this doesn't trip the "too many strange words" flag that's used for word salads. My Thunderbird filter is really having trouble with these. Anyone else having trouble with these spams?
        
        Parent Share
        twitter facebook
    - Re:One of the best things Google/GMail could do (Score:2, Informative)
      
      by wo1verin3 ( 473094 ) writes:
      
      It's a good thing you're not using Outlook. :)
      
      I get those in Eudora and they don't seem to do much, my friends with Outlook however... not so lucky. :)
    - Re:One of the best things Google/GMail could do (Score:3, Informative)
      
      by xandroid ( 680978 ) writes:
      
      Try looking at the source -- when this happens to me, I see that the random words are plaintext, and the intended advertisement is in HTML (which I've blocked).
    - Re:One of the best things Google/GMail could do (Score:3, Informative)
      
      by ryen ( 684684 ) writes:
      
      those emails could possibly also contain embedded image tags (known as web beacons). when you open an email and attempt to 'download' the image, some server on the net knows it was you who retreieved the image and has just verified that your email address is active and spammable.
    - - Re:One of the best things Google/GMail could do (Score:5, Insightful)
        
        by jefe7777 ( 411081 ) writes: on Monday June 14, 2004 @11:34AM (#9420317) Journal
        
        >> You think they bother?
        
        heh heh...abdolutely.
        
        100 known good addresses are worth 10,000 "who the fuck knows" addressess.
        
        >>It's cheaper to just send mail to everyone
        
        no it's not.
        
        let's pretend you are a spammer, and you want to send out spam.
        
        If you target 1 billion questionable addresses, each time a client has a new campaign, then that's 1 billion pieces you have to deliver. every time.
        
        what if you have 1000 clients? that's 1000 billion deliveries.
        
        do you see where this is going? if you don't KNOW WHAT A VALID EMAIL ADDRESS IS, YOU HAVE TO GUESS.
        
        but what if the first time you send out just a "test" to those billion addresses, and then subtract the one's that bounce.
        
        You are left with 50,000 known good addresses.
        
        that's gold. You now have 1/20th of the load,and you are now serving your clients quicker, a helluva lot less load. you are only using an open relay for 1/20th of the time.
        
        overall a smaller footprint by 1/20th.
        
        you tell me. does it make sense to blindly blast out email?
        
        Parent Share
        twitter facebook
        
        Re:One of the best things Google/GMail could do (Score:3, Informative)
        
        by FooAtWFU ( 699187 ) writes:
        
        >>It's cheaper to just send mail to everyone
        >no it's not.
        It doesn't matter how cheap it is when 80% of spam supposedly comes from infected zombie computers. (I'm too lazy to actually LINK to the recent story on this.)
  - Re:One of the best things Google/GMail could do (Score:2, Insightful)
    
    by Anonymous Coward writes:
    
    Anti-Spammers have thought of this, too. Things like the Distributed Checksum Clearinghouses [rhyolite.com] have "fuzzy" matching.
    
    Google also has enough computer power to generate some sort of Bayesian filter to catch the most common spam system wide, and even a personalized filter on each account to catch the rest.
- Spam is always personalized (Score:5, Informative)
  
  by Sulka ( 4250 ) writes: <sulka@[ ].fi ['iki' in gap]> on Monday June 14, 2004 @10:49AM (#9419902) Homepage Journal
  
  Checksums are nearly useless against spam. It only takes one byte to change the checksum value and probably more than 90% of spam contain a personalization code to check which addresses are functional. Different code = different checksum.
  
  This doesn't mean it wouldn't be possible to create a system which would automatically detect individual spam messages based on tagging known spam, you just have to be smarter about the detection than just plain MD5ing the email body.
  
  Parent Share
  twitter facebook
  - Re:Spam is always personalized (Score:2, Interesting)
    
    by GlassUser ( 190787 ) writes:
    
    gzip it and compare the files. a short tracking code will make a negligible difference.
    - Re:Spam is always personalized (Score:5, Informative)
      
      by Thuktun ( 221615 ) writes: on Monday June 14, 2004 @12:04PM (#9420604) Journal
      
      gzip it and compare the files. a short tracking code will make a negligible difference.
      
      Not necessarily.
      
      Lempel-Ziv based algorithms, like the one used by gzip, build a compression dictionary on the fly. Any "personalization" added to the message will affect the dictionary to varying degrees from then onward. If it's near the beginning, the personalization would greatly skew the selected dictionary identifiers. Though probably this would have little effect on the actual compression of the data, it would radically change the representation of the compressed image. The farther this personalization is from the start of the data to be compressed, the less effect it will have.
      
      Parent Share
      twitter facebook
  - Image-Based Spam and Checksums (Score:3, Interesting)
    
    by BarefootClown ( 267581 ) writes:
    
    What about vetting at least the image-based spam for checksumming? Scan the e-mail for image links (or images included inline). If there's a link, check it against the known list of spam links. If it's in the list, mark the message as spam. Spammers will quickly figure that trick out, though, so step two would be for Google to follow those links, and retrieve the images. Run a checksum of the image file itself; if there are a lot (say, a thousand) messages including the same image, tag it as spam. Thi
- Re:One of the best things Google/GMail could do (Score:2)
  
  by sugar and acid ( 88555 ) writes:
  
  Except for the many different legit mailing lists that people subscribe to. Any kind of bulk email will be screened by this, thus crippling gmail by preventing mailing lists that people subscribe to from being delivered.
- - Re:One of the best things Google/GMail could do (Score:2)
    
    by Pharmboy ( 216950 ) writes:
    
    Google has some pretty bright minds aboard
    
    Yes they do, this is just one of the articles discussing this, here. [nytimes.com]
    
    They have a much higher ratio of PhDs than Microsoft, or just about anyone short of a hospital. They also give their employees the freedom of spending 20% of their time working on any unrelated subject they choose, appearantly in the hopes that the outcome of this research will benefit Google, or at least will make the better PhD's with more than one iron in the fire, WANT to work for them.
    - Re:One of the best things Google/GMail could do (Score:5, Funny)
      
      by ckd ( 72611 ) writes: on Monday June 14, 2004 @11:39AM (#9420375) Homepage
      
      They have a much higher ratio of PhDs than Microsoft, or just about anyone short of a hospital.
      
      Remind me not to go to your hospital. I want MDs treating me, not people who can give me a dissertation on ancient Sumeria or something. (MDs who also know about ancient Sumeria excepted.)
      
      Parent Share
      twitter facebook
He gave out his e-mail address... (Score:5, Funny)

by Anonymous Coward writes: on Monday June 14, 2004 @10:38AM (#9419791)

... to the entire Slashdot community! Now he's going to be flooded with all sorts of spam and shit. LOL!

Oh... right. :)

Share
twitter facebook
- Re:He gave out his e-mail address... (Score:5, Funny)
  
  by umrgregg ( 192838 ) writes: on Monday June 14, 2004 @10:58AM (#9419984) Homepage
  
  Notice the reader who submitted the story was anonymous... Gotta love friends who sign you up for spam.
  
  Parent Share
  twitter facebook
- Re:He gave out his e-mail address... (Score:5, Interesting)
  
  by Algan ( 20532 ) writes: on Monday June 14, 2004 @12:31PM (#9420884)
  
  It's not that bad as you think. I posted an dedicated email address to slashdot two times already, just to see what volume of spam I get. Surprisingly, it's only 2-3 messages every other day or so.
  
  Well, I guess I need a booster shot, so here it is: slashdot@hates.ms. Spam away...
  
  Parent Share
  twitter facebook
  - Re:He gave out his e-mail address... (Score:3, Funny)
    
    by istewart ( 463887 ) writes:
    
    I once posted my AIM screenname (IStewart12) to Slashdot and got a total of one message from a concerned individual warning me about the flood of IMs I was now likely to receive. Must not have been a very active thread.
whining? (Score:5, Insightful)

by Gothmolly ( 148874 ) writes: on Monday June 14, 2004 @10:38AM (#9419796)

What's Google going to do to protect its users from mail bombs?

Now you're complaining that your free, 1GB-limit, access-from-anywhere email service could be mailbombed? Live with it. If Google "decides" anything more about our emails, we put on our tinfoil hats and scream. If we broadcast a bogus email address, obtained from gmail for clearly sinister purposes, and it gets mailbombed, we whine that Google doesn't "protect" us. Whats the story, or are we all just schizophrenic?

Don't want that "vulnerability"? Don't use Gmail!

Share
twitter facebook
- Re:whining? (Score:5, Insightful)
  
  by supersnail ( 106701 ) writes: on Monday June 14, 2004 @10:47AM (#9419877)
  
  I don't think its about protection just practicality. Google offers a SPAM filter the littel pratt tested it and found it wanting.
  
  I think its more of a problem for Google than the end users. The whole Gmail "get a gigiabyte of memeory free" business model is predicated on most people using only a small fraction of that Gigibayte but felling good about the capacity being there. If I open up a gmail account, get p*ss*d of with the spam and go elsewhere without closing the account the 1G will fill up with spam in a couple of months, Google will end up storing terabytes of spam for cutomers who no longer use the service.
  
  Parent Share
  twitter facebook
  - Re:whining? (Score:2)
    
    by ovlaski ( 73485 ) writes:
    
    So what? If they have terabytes and terabytes of spam, they have a huge database to teach their filters with.
  - Re:whining? (Score:3, Interesting)
    
    by thogard ( 43403 ) writes:
    
    If I offer 10 of my most leeaching customers 1 gig of space, I will need 10 gig of space... or will I? How much of that will be duplicated between at least two users and how much of it will be used by all 10? Remember Google already has copies off allmost all the useful stuff on the net. If you grab some random web page and mime attach it to email, thats going to waste space in my mailbox but if google can figure out that they already have all the images, as well as the text, its going to compress down t
  - Re:whining? (Score:4, Informative)
    
    by cmacb ( 547347 ) writes: on Monday June 14, 2004 @11:27AM (#9420255) Homepage Journal
    
    Actually the TOS for Gmail says that doing things to attract spam is a violation, so they could just close the account on that basis. Also, if you don't sign on for a certain period of time (a few months I think) the account gets deleted. I had a Yahoo ID for years before I ever knew there was an e-mail address associated with it. I never read the mail associated with my AIM id and I probably still have free hotmail and a few other things like that floating around. Failure of these companies to delete idle accounts is what causes all the good names to be taken. I think Google is more on-top of this than many of the others.
    
    Parent Share
    twitter facebook
  - Re:whining? (Score:3, Interesting)
    
    by Valluvan ( 564515 ) writes:
    
    Not many are as gregarious as Pratt. I've been using gmail for some time now. I must say google has done a pretty good job with their spam filters. For not-high-volume users (which most people are), gmail works much better than other email providers (i have yahoo, ureach and hotmail accounts which I use regularly).
    
    Of course, google should improve and filter out the occasional crap I get too. And also offer 1 TB.
  - Re:whining? (Score:5, Informative)
    
    by Beryllium Sphere(tm) ( 193358 ) writes: on Monday June 14, 2004 @11:43AM (#9420411) Journal
    
    >The whole Gmail "get a gigiabyte of memeory free" business model is predicated on most people using only a small fraction of that Gigibayte
    
    Why?
    
    Google uses commodity IDE drives. Those retail for about fifty cents a gigabyte. Google's not paying retail.
    
    I read a quote from a Googleperson that by the time the drive is installed in a system, powered, cooled, backed up and administered Google is paying two dollars for a gigabyte.
    
    Good point about the problem of abandoned accounts, which won't bring Google any ad revenue. Wouldn't be surprised if they start euthanizing inactive accounts.
    
    Parent Share
    twitter facebook
- Re:whining? (Score:5, Insightful)
  
  by Pharmboy ( 216950 ) writes: on Monday June 14, 2004 @10:55AM (#9419956) Journal
  
  Now you're complaining...
  
  That is his JOB, to point out shortcomings of the system. He is a tester, and he is doing it for FREE. Google doesn't want testers who get 3 emails a day, they want people to test the living shit out of the service and point out what is wrong with it. Everyone knows Google will try to fix all the bugs, so all the press, good or bad, is still good press.
  
  If Google barfs when handling 999 messages in 4 minutes during testing, image when several million people have gmail accounts. Fortunately, now Google has an even to look at to see what the problem is. When you are trying to harden a system, YOU MUST BREAK IT OVER AND OVER AGAIN, to see where it is weak. This is what is happening.
  
  My impression is that the tech's at Google are spending a significant amount of time saying "oh shit, never thought of that, cool." which is the ENTIRE REASON FOR TESTING. They can't think of every situation by themselves. This is also the entire concept behind "open software is more secure". Google's gmail is going to have bugs at this stage and lots of them, period. Google knows this, hell, everyone knows this (this is why its in testing, and not open to the public yet, duh)
  
  It not whinning, its stating the facts, which Goggle obviously WANTS him to gather, as a TESTER. Seems to me that he is going beyond the call of duty to test their servers, since he is spending a fair amount of his own time.
  
  Parent Share
  twitter facebook
  - Re:whining? (Score:3, Funny)
    
    by AKnightCowboy ( 608632 ) writes:
    
    When you are trying to harden a system, YOU MUST BREAK IT OVER AND OVER AGAIN, to see where it is weak.
    Slashdot operated under that philosophy for the first 2-3 years of it's existence. ;-)
- Re:whining? (Score:2)
  
  by GeorgeH ( 5469 ) writes:
  
  "Whats the story, or are we all just schizophrenic?"
  
  Yeah, we all have multiple personality disorder. Luckily we also have multiple bodies, so we dole these personalities at around 1 per body.
  
  You're complaining about the lack of consistant thought from a crowd of random web surfers...
If this guy has used 30% of his capacity... (Score:3, Insightful)

by Dagny Taggert ( 785517 ) writes: <hankreardenNO@SPAMgmail.com> on Monday June 14, 2004 @10:38AM (#9419798) Homepage

...how many e-mails has he recieved in total? I've kept spam for six months before and it totaled less than 100MB...and I get a cubic buttload of crap daily.

Share
twitter facebook
- Re:If this guy has used 30% of his capacity... (Score:5, Funny)
  
  by Zeebs ( 577100 ) writes: <rsdrew.gmail@com> on Monday June 14, 2004 @01:19PM (#9421433)
  
  and I get a cubic buttload of crap daily
  
  God damned metric system.
  
  Parent Share
  twitter facebook
gmail still beta (Score:2, Insightful)

by ryen ( 684684 ) writes:

isn't gmail still in 'beta' stages? if so, isn't a review of spam filtering techniques a little premature?
- Re:gmail still beta (Score:5, Funny)
  
  by waddgodd ( 34934 ) writes: on Monday June 14, 2004 @10:42AM (#9419835) Homepage Journal
  
  >isn't gmail still in 'beta' stages? if so, isn't a review of
  >spam filtering techniques a little premature?
  
  What part of Beta TEST escapes you here?
  
  Parent Share
  twitter facebook
- Re:gmail still beta (Score:2)
  
  by AviLazar ( 741826 ) writes:
  
  Testing occurs at all points through the process. This raw data needs to be "processed" & "reviewed" so that viable results can be determined. Once people have these results, they can try and come up with fixes for it.
  The fact that this guy posted, on hise website, for the net-world to see is just his way of giving the net-world an update. On a personal note, I think it was nice of him to do such. Especially since he will have to kill that e-mail account after giving it to /. people, who I am sure
Not a fair test (Score:5, Insightful)

by SWroclawski ( 95770 ) writes: <(serge) (at) (wroclawski.org)> on Monday June 14, 2004 @10:40AM (#9419813) Homepage

He's not counting all the mail that Google is rejecting and not even being allowed in for further classification.

Share
twitter facebook
- Re:Not a fair test (Score:4, Insightful)
  
  by Plutor ( 2994 ) writes: on Monday June 14, 2004 @12:09PM (#9420653) Homepage
  
  Is there any evidence that Google actually does this? I would think that would be terribly non-transparent. Auto-deleting email that it's "really sure" is spam is still dangerous. Even the best-trained Bayesian filters will have false positives sometimes. Is this just random theorizing, or does GMail really fail to deliver some emails it thinks is spam?
  
  Parent Share
  twitter facebook
  - Re:Not a fair test (Score:4, Informative)
    
    by SWroclawski ( 95770 ) writes: <(serge) (at) (wroclawski.org)> on Monday June 14, 2004 @02:01PM (#9421891) Homepage
    
    Any evidence that they reject mail for various reasons? I'm sure there is. You can go ahead and see which RFCs they're in compliance with and which they aren't.
    
    If you don't have a PTR record associated with your host, try to send mail to them, or malform your EHLO or something else.
    
    You don't need to be "really sure" mail is spam- I'm talking about doing things like standards complaince checking, which will result in mail being rejected at delivery time.
    
    Is this just random theorizing, or does GMail really fail to deliver some emails it thinks is spam?
    
    There's no reason to get insulting. RFC 2821 has a number of requirements for delivery of mail that many services ignore.
    
    Parent Share
    twitter facebook
Should be interesting, what filters? (Score:5, Interesting)

by Clinoti ( 696723 ) * writes: on Monday June 14, 2004 @10:41AM (#9419816)

Can anyone provide a link or source to the kind of filters google has working on gmail?

Share
twitter facebook
I'll help (Score:5, Funny)

by L. VeGas ( 580015 ) writes: on Monday June 14, 2004 @10:41AM (#9419818) Homepage Journal

Let's all send him an email and ask him how it's working out.

Share
twitter facebook
News... (Score:5, Funny)

by somethinghollow ( 530478 ) writes: on Monday June 14, 2004 @10:41AM (#9419821) Homepage Journal

"Here is also an article talking about Aaron's efforts from webpronews.com""

Since we are talking about spam and obtaining more spam, I don't know if I should read the site the article is on as "web pro news dot com" or "web pron ews dot com"...

I guess I'll figure it out sometime.

Share
twitter facebook
Not that impressive (Score:5, Informative)

by chrisgeleven ( 514645 ) writes: on Monday June 14, 2004 @10:42AM (#9419829) Homepage

Seems like Gmail only filters approx. 50% of spam. That is not very impressive, since the top anti-spam software and e-mail clients (such as Outlook 2003 and Mozilla Thunderbird) can easily reach 95% accuracy in spam filtering.

I am starting to second guess whether I should transfer everything to my Gmail account.

Share
twitter facebook
- Re:Not that impressive (Score:5, Insightful)
  
  by Apiakun ( 589521 ) writes: <tikora AT gmail DOT com> on Monday June 14, 2004 @10:44AM (#9419851)
  
  Don't forget that this is google's first foray into mail software, and it is still in beta. I have so far gotten very little spam in my gmail inbox.
  
  Parent Share
  twitter facebook
  - Re:Not that impressive (Score:5, Funny)
    
    by peeping_Thomist ( 66678 ) * writes: on Monday June 14, 2004 @10:52AM (#9419931)
    
    I have so far gotten very little spam in my gmail inbox.
    
    What was that address again?
    
    Parent Share
    twitter facebook
    - Re:Not that impressive (Score:2, Informative)
      
      by Kredal ( 566494 ) writes:
      
      tikora@gmail.com, I think.
      
      Mine is kredal@gmail.com, if you're interested. (:
      - I redirected an old address (Score:3, Interesting)
        
        by FooAtWFU ( 699187 ) writes:
        
        I redirected an old manager@(two letters here).net site so Gmail gets a carbon copy of all the spam sent there (it's lots, trust me). At first it seemed that my Thunderbird Bayseian filters were doing better, but the trend seems to have reversed lately.
        No, I'm not keeping proper statistics. =b
      - Re:Not that impressive (Score:4, Funny)
        
        by furball ( 2853 ) writes: on Monday June 14, 2004 @12:22PM (#9420776) Journal
        
        Mine's gdnguyen@gmail.com.
        
        Please only email me if you're barely legal and running a webcam. Thank you.
        
        Parent Share
        twitter facebook
- Re:Not that impressive (Score:5, Informative)
  
  by XO ( 250276 ) writes: <(blade.eric) (at) (gmail.com)> on Monday June 14, 2004 @10:47AM (#9419879) Homepage Journal
  
  Sure, but those will also mark virtually every legitimate email as spam, as WELL. Yeah, you can have 95% accuracy... but then you have to go through your hundreds of messages marked spam just to find your real email!
  
  (example, after two weeks of using spam-assassin, it decided that every e-mail sent to me was spam.. i no longer received anything in my Inbox, everything was transferred to the Spambox. It took me another two weeks tweaking spam-assassin's kill rate down to about a 50% accuracy, and now i actually receive all my emails.)
  
  Parent Share
  twitter facebook
  - Re:Not that impressive (Score:2)
    
    by javatips ( 66293 ) writes:
    
    I'm using a mail service that has SpamAssasin (mailsnare.net). I configured my account so that SpamAssasin mark messages, but I did not create any filters to delete them of move them to a Spam folder.
    
    Lately, I've received legitimate e-mail from someone else with a Yahoo Mail account. Spam Assassin mark them as Spam.
    
    However, my mail client, Mozilla Firebird, does not mark them as Spam... So it stay in my Inbox (even if it contain the Spam Assasin Header and modified title).
    
    Actually, the Firebird Spam filt
  - Re:Not that impressive (Score:3, Informative)
    
    by ravydavygravy ( 230429 ) writes:
    
    Sure, but those will also mark virtually every legitimate email as spam, as WELL. Yeah, you can have 95% accuracy... but then you have to go through your hundreds of messages marked spam just to find your real email!
    
    Rubbish - I've used thunderbird for many months now, with an account that gets quite a bit of spam. I have yet to see thunderbird make a wrong guess at whats spam and whats not. If anything, thunderbird is more likely to go the other way - allowing spam through - than deleting real email.
- Re:Not that impressive (Score:2)
  
  by gmuslera ( 3436 ) writes:
  
  There are other multiplataform solutions (i.e. popfile, that can easily be "plugged" in both, but there are a lot more available choices) that have over 99% accuracy, but active training one of the biggest components of their success (every time a message is misclassified, correct their choice). I hope gmail will have far better spam detection ratio when out of beta and well managed by the user.
Should gmail be filtering all emails (Score:2, Funny)

by Anonymous Coward writes:

If I understand what he was talking about on his site, what he cansiders as spam partialy legitimate mailing lists and are not realy spam even if he did not personally sign up for them. IE (Me signing him up for the gay porn of the month club.) He may not want it but, I signed him up.
I just want.... (Score:2, Funny)

by AviLazar ( 741826 ) writes:

to be able to reserve a name without numbers attached to it.... Damn it's going to be a race :(
What is the big deal? (Score:2, Insightful)

by Zugot ( 17501 ) * writes:

Mozilla Thunderbird or Spamassassin will filter at least as well or even better. Is this just a test to see how quickly we can fill up gmail's disk?
Is this the AventureMail guy? (Score:5, Interesting)

by magefile ( 776388 ) writes: on Monday June 14, 2004 @10:45AM (#9419861)

The guy who got booted off AventureMail (2GB free) for trying to test their spam filters? The story is on Kuro5hin [kuro5hin.org], if anyone wants to see it.

Share
twitter facebook
My own gmail testing (Score:5, Informative)

by Twid ( 67847 ) writes: on Monday June 14, 2004 @10:48AM (#9419889) Homepage

I did some testing of my own. I forwarded a ton of spam from my personal account to my gmail account, just to see what would get through and what would be filtered. For me, gmail was really effective, but strangely, one Nigerian e-mail scam mail didn't get tagged.

It was from " Mr Jubril Udeh Manager of Credit and Accounts Department of North Atlantic Securities Sarls Lome-Togo Republic."

Now, the funny part is not that the mail made it through, but that google also decided to show me contextual ad's on that account. Currently, the ads are:
- Payroll Cards a Poor Substitute for Checking Account
- Tips for Tackling Check Fraud
- Sophos hoax description: Ethiopian airline letter
- FAP non-US Investment FAQs

In the past the mail has also shown me ads on how to open an off-shore bank account. I'm glad google is willing to help me with the $10.5 million dollars that I'm about to receive! :)

Share
twitter facebook
Re: (Score:2, Interesting)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
- Hmmm.. weird stats... (Score:2, Interesting)
  
  by Mz6 ( 741941 ) * writes:
  
  Hmm.. well let's see...
  His last week stats are:
  
  3778 messages were received, totaling 213 MB. 3917 were spam, and Gmail correctly identified 41.9% of these messages.
  
  Something is off... Unless his spams contain attachments, this says that each of his emails were 17 MB in size each.
  I mean 17.73708.. This is /. afterall. :)
  - Re:Hmmm.. weird stats... (Score:5, Informative)
    
    by Satai ( 111172 ) * writes: on Monday June 14, 2004 @11:11AM (#9420092)
    
    no, you inversed it. You want MB/message, not message/MB.
    
    3778 messages / 213 MB = 17.37 messages / MB
    213 MB / 3778 messages = 0.0564 MB / message
    
    So that's pretty reasonable.
    
    Parent Share
    twitter facebook
About spam and blocking (Score:5, Interesting)

by AviLazar ( 741826 ) writes: on Monday June 14, 2004 @10:53AM (#9419941) Journal

While we cannot block every domain name (i.e. if you get spam from $#(*$#sexphreak@yahoo.com) because it will alienate your legitimate contacts, there are many domain names that we can block (i.e. @spam-your-gmail.com). Yahoo provides email/domain name blocking, but limits this to 100 (unless you are paying). Do we know if gmail will have this limitation?
-A
*just for those who didn't know, the above domain names and email accounts are random, any resemblence to an actual domain or email account is purely coincidental, and if you choose to do so, you should sue /., not me :)

Share
twitter facebook
- Re:About spam and blocking (Score:2, Interesting)
  
  by OiPolloi ( 638427 ) writes:
  
  Gmail allows you to create any number (at least they don't seem to have any limit) of "filters". These are rules that allow you to manage your messages based on sender, recipient, subject, if the message has an attachment, if it has certain words, etc.
  
  So this allows you to block some domains, if you'd like.
- Re:About spam and blocking (Score:3, Insightful)
  
  by stevesliva ( 648202 ) writes:
  
  I've found whitelists, combined with treating everything as junk, to be far more useful than blacklists.
1gb Relieves Spam Concerns (Score:5, Interesting)

by osewa77 ( 603622 ) writes: <naijasms@ g m a i l . com> on Monday June 14, 2004 @10:54AM (#9419945) Homepage

I have subjected my e-mail address, afriguru@gmail.com [mailto] to the same abuse. by redirecting all e-mail addresses that recieve lots of junk mail to this one and posting the address unprotected to lots of websites and newsgroups. At the initial stage, a lot of 419 scam mails got through, but now I hardly get any spam. No false positives for me so far.
_____________________
Seun Osewa, Abeokuta Nigeria [seunosewa.com]

Share
twitter facebook
If you get mailbombed... (Score:2)

by Ieshan ( 409693 ) writes:

Select "create a filter". Do so with the text of the bomb.

Select all the messages that it displays as able to be included that you've already archived (one click).

Select "Move to trash".

Viola.
- Viola (Score:5, Funny)
  
  by doodlelogic ( 773522 ) writes: on Monday June 14, 2004 @11:21AM (#9420187)
  
  If I could stop all the spam I get...I'd feel like a whole string quartet!
  
  Parent Share
  twitter facebook
gmail spelling (Score:5, Funny)

by Anonymous Coward writes: on Monday June 14, 2004 @10:58AM (#9419977)

>legitamate

How about having Slashdot editors/Hemos test the gmail spell checker too?

Share
twitter facebook
This won't work for me... (Score:2, Funny)

by Chuck Bucket ( 142633 ) writes:
This won't work for me, how will I get emails like:
- Home loans and refinancing
- Proven techniques help you find a date tonight - guaranteed!
- suuuper streeeeeeetch your coock
- Drive that new car today
- Give the girl what she needs
- STRAIGHT TALK ON HAIR TRANSPLANTS
- SEXUALLY-EXPLICIT: Rise N Shine, there all here
- Your Degree by Fedex shipped
- Make your man hood work right
- Rooooock Haaaaard Ereeeectiooons In 60 Seeeeeecooooooonds
- Sexually Explicit: At home mom's nude on cams
- Free Phone Free Shipping Easy Qualify
- get the p .e.
How is he compiling stats? (Score:2, Interesting)

by HellKnite ( 266374 ) writes:

Anyone know how he's pulling the numbers off the page? Is there some kind of sneaky back-end that we can get stats about our account with? Is he manually entering all this info? Or maybe some kind of "screen-scraping" techniques to pull the data off the page... hmm...

I guess because his stats are about 2-3 weeks behind, it would indicate that things are leaning towards the manual procedure...
0% Spam (Score:5, Interesting)

by yuri ( 22724 ) writes: on Monday June 14, 2004 @11:00AM (#9419995)

Spam is unsolicited, so google should filter none of his mail.

This guy solicited it.

Share
twitter facebook
Filtering could use some help (Score:2, Funny)

by csimpkins ( 787236 ) writes:

This guy gets thousands of Spam mails without a problem, yet I can't receive a simple HTML attachment without the mail being rejected (552 Illegal Attachment). Hrmm...
Lack of updates? (Score:5, Interesting)

by Xiadix ( 159305 ) writes: on Monday June 14, 2004 @11:07AM (#9420057) Homepage Journal

Did anybody else notice that his site hasn't been updated in almost a month (May 25)? Seems his project is no longer working. I wonder if Google booted him.

KevG

Share
twitter facebook
It's going to get a lot better... (Score:5, Interesting)

by waytoomuchcoffee ( 263275 ) writes: on Monday June 14, 2004 @11:11AM (#9420087)

For those of you that don't have Gmail yet, there is a little "Report Spam" button you can use to, well, report spam. When Gmail gets a few million users, and even 1% use this little button, you are going to see the spam detect rate skyrocket.

Share
twitter facebook
Cache? (Score:5, Funny)

by Freon115 ( 672518 ) writes: on Monday June 14, 2004 @11:20AM (#9420180) Journal

Do you really expect the Google servers to go down because of /.? ;)

Share
twitter facebook
- Re:Cache? (Score:5, Funny)
  
  by leo_llew ( 697711 ) writes: on Monday June 14, 2004 @11:33AM (#9420306)
  
  Obviously not, they provided a link to the GOOGLE Cache ;)
  
  Parent Share
  twitter facebook
Wow (Score:5, Funny)

by EaterOfDog ( 759681 ) writes: on Monday June 14, 2004 @11:40AM (#9420386)

His wang is going to be huge!

Share
twitter facebook
Paid yahoo is better (Score:3, Insightful)

by Avumede ( 111087 ) writes: on Monday June 14, 2004 @12:03PM (#9420592) Homepage

I pay the $20 for extra Yahoo email, and I have to say that their spam filtering is much better than gmail's right now. I have about 10 spams a day to clear out of gmail, where with Yahoo it's more like 1, often 0.

People that don't pay for Yahoo don't seem to get such good spam filtering, though.

Google can definitely do better.

Share
twitter facebook
Calculations? (Score:3, Insightful)

by haxor.dk ( 463614 ) writes: on Monday June 14, 2004 @12:08PM (#9420640)

So, in less than a month, he has recieved in excess of 300 Megabytes of useless junk ?

I think somebody needs to recalculate axactly how much bandwidth go to waste because of this SPAM plague. The cost in global comms traffic must be staggering!

Share
twitter facebook
Now my friends are spamming me (Score:3, Funny)

by mparaz ( 31980 ) writes: on Monday June 14, 2004 @12:30PM (#9420869) Homepage

"Please invite me to GMail!"

Share
twitter facebook
Dumb question about SPAM filters.. (Score:4, Interesting)

by StressGuy ( 472374 ) writes: on Monday June 14, 2004 @12:32PM (#9420889)

I have Mozilla, it has a Bayes SPAM filter. Lately, it's been getting fooled more and more. The messages that make it through have one or more of the following features:

1) Several intentionally mis-spelled words

2) Lots of text in white (so it's invisible or nearly invisible)

3) Message in .GIF form only - no plain text.

Could you add filters that look for, say, more than 10% of the words mis-spelled, text font nearly equal to background color, or no actual text in message? These would take effect in addition to the existing Bayes filter.

Share
twitter facebook
Aventuremail not as tolerant (Score:4, Interesting)

by dirvish ( 574948 ) writes: <dirvish&foundnews,com> on Monday June 14, 2004 @01:42PM (#9421695) Homepage Journal

I tried to do the same thing with my AventureMail [aventuremail.com] account but AventureMail wasn't cool with it. They deleted my account! You can check out what little data I collected before the account suspension and read the emails to and from AventureMail about the merits of the account suspension at http://3fingersalute.net/aventuremail [3fingersalute.net]

Share
twitter facebook
Yale Story (Score:3, Interesting)

by dirvish ( 574948 ) writes: <dirvish&foundnews,com> on Monday June 14, 2004 @02:35PM (#9422283) Homepage Journal

Here is a discussion [yale.edu] from Yale's LawMeme on the legal ramifications of Prattboy's experiment. Does asking others to sign you up for spam count as an opt-in?

Share
twitter facebook
- Re:There's Epic Imagery Here Somewhere (Score:2)
  
  by Paulrothrock ( 685079 ) writes:
  
  A digital Thermopylae.
- Re:How to never get spam (Score:5, Funny)
  
  by mumblestheclown ( 569987 ) writes: on Monday June 14, 2004 @11:37AM (#9420351)
  
  Hi! And welcome to the Internet! We're glad to have you aboard.
  Just to get you started, I'll give you a quick hint: virtually every internet discussion on spam includes some high and mighty moron that claims that by not giving out his email address, he never gets spam.
  The problem is, that for every one of those, there are plenty more who follow the same precautions and yet get plenty of spam to those accounts for a variety of reasons. Clearly, your soution is not the answer to "how to never get spam."
  A good rule for using the internet is to read a few discussions before you post. This way, you will be less likely to post something that makes you look naive. So sit back, relax, and enjoy a steaming hot cup of STFU while you read and learn!
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

first spam? (Score:5, Funny)

Re:first spam? (Score:4, Funny)

More focus on false positives. (Score:5, Insightful)

Re:More focus on false positives. (Score:4, Informative)

Re:More focus on false positives. (Score:3, Informative)

The Filter is great! (Score:5, Funny)

pre-emptive strike theory (Score:2)

Re:pre-emptive strike theory (Score:5, Funny)

Re:The Filter is great! (Score:5, Funny)

Re:The Filter is great! (Score:5, Informative)

One of the best things Google/GMail could do (Score:5, Interesting)

Re:One of the best things Google/GMail could do (Score:5, Informative)

Re:One of the best things Google/GMail could do (Score:5, Funny)

Re:One of the best things Google/GMail could do (Score:5, Interesting)

Re:One of the best things Google/GMail could do (Score:4, Interesting)

Re:One of the best things Google/GMail could do (Score:5, Informative)

Re:One of the best things Google/GMail could do (Score:4, Interesting)

Re:One of the best things Google/GMail could do (Score:5, Insightful)

Re:One of the best things Google/GMail could do (Score:4, Insightful)

New spin on the "word salad" strategy (Score:5, Interesting)

Re:One of the best things Google/GMail could do (Score:2, Informative)

Re:One of the best things Google/GMail could do (Score:3, Informative)

Re:One of the best things Google/GMail could do (Score:3, Informative)

Re:One of the best things Google/GMail could do (Score:5, Insightful)

Re:One of the best things Google/GMail could do (Score:3, Informative)

Re:One of the best things Google/GMail could do (Score:2, Insightful)

Spam is always personalized (Score:5, Informative)

Re:Spam is always personalized (Score:2, Interesting)

Re:Spam is always personalized (Score:5, Informative)

Image-Based Spam and Checksums (Score:3, Interesting)

Re:One of the best things Google/GMail could do (Score:2)

Re:One of the best things Google/GMail could do (Score:2)

Re:One of the best things Google/GMail could do (Score:5, Funny)

He gave out his e-mail address... (Score:5, Funny)

Re:He gave out his e-mail address... (Score:5, Funny)

Re:He gave out his e-mail address... (Score:5, Interesting)

Re:He gave out his e-mail address... (Score:3, Funny)

whining? (Score:5, Insightful)

Re:whining? (Score:5, Insightful)

Re:whining? (Score:2)

Re:whining? (Score:3, Interesting)

Re:whining? (Score:4, Informative)

Re:whining? (Score:3, Interesting)

Re:whining? (Score:5, Informative)

Re:whining? (Score:5, Insightful)

Re:whining? (Score:3, Funny)

Re:whining? (Score:2)

If this guy has used 30% of his capacity... (Score:3, Insightful)

Re:If this guy has used 30% of his capacity... (Score:5, Funny)

gmail still beta (Score:2, Insightful)

Re:gmail still beta (Score:5, Funny)

Re:gmail still beta (Score:2)

Not a fair test (Score:5, Insightful)

Re:Not a fair test (Score:4, Insightful)

Re:Not a fair test (Score:4, Informative)

Should be interesting, what filters? (Score:5, Interesting)

I'll help (Score:5, Funny)

News... (Score:5, Funny)

Not that impressive (Score:5, Informative)

Re:Not that impressive (Score:5, Insightful)

Re:Not that impressive (Score:5, Funny)

Re:Not that impressive (Score:2, Informative)

I redirected an old address (Score:3, Interesting)

Re:Not that impressive (Score:4, Funny)

Re:Not that impressive (Score:5, Informative)

Re:Not that impressive (Score:2)

Re:Not that impressive (Score:3, Informative)

Re:Not that impressive (Score:2)

Should gmail be filtering all emails (Score:2, Funny)

I just want.... (Score:2, Funny)

What is the big deal? (Score:2, Insightful)

Is this the AventureMail guy? (Score:5, Interesting)

My own gmail testing (Score:5, Informative)

Re: (Score:2, Interesting)

Hmmm.. weird stats... (Score:2, Interesting)

Re:Hmmm.. weird stats... (Score:5, Informative)

About spam and blocking (Score:5, Interesting)

Re:About spam and blocking (Score:2, Interesting)

Re:About spam and blocking (Score:3, Insightful)

1gb Relieves Spam Concerns (Score:5, Interesting)