Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Spam The Internet IT

Microsoft Researchers on Stopping Spam 294

TheBackBencher writes "Scientific American today has a very interesting article about "Stopping Spam" by Joshua Goodman, David Hackerman and Robert Rounthwaite from Microsoft Research. They talk about different types of spam -- spam with emails, spam on IMs, spamlinks on web pages and image based spam. They mention different techniques for spam filtering mainly fingerprinting matching techniques, n grams model, naive bayesian approach, optical character recognition, challenge/response systems and Human Interacted Proofs (HIP) in a very lucid style. They however do not mention fingerprinting approach of using Nilsimsa Hash to tackle addition of random words by spammers in emails or hypertextus interruptus technique used by spammers of splitting words using HTML comments, pairs of zero width tags, or bogus tags. Also, Spam-Research is reporting the SplitFit Technique that Spammers are using to fool Yahoo! Mail SpamGuard."
This discussion has been archived. No new comments can be posted.

Microsoft Researchers on Stopping Spam

Comments Filter:
  • by Anonymous Coward on Monday April 11, 2005 @08:51PM (#12207797)
    Creating your own spamming division, use illegal tactics to undercut your spamming competition, put them out of business, then stop spamming.
    • by vwjeff ( 709903 ) on Monday April 11, 2005 @09:24PM (#12208049)
      They talk about different types of spam -- spam with emails, spam on IMs, spamlinks on web pages and image based spam.

      Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam; spam bacon sausage and spam; spam egg spam spam bacon and spam; spam sausage spam spam bacon spam tomato and spam;

      (punches self in face face hits keyboardadsfjlk;sjdafkldsajflsdak;fjsad;lfkjas;ldk fjas)
  • I don't know... (Score:4, Insightful)

    by Anonymous Coward on Monday April 11, 2005 @08:55PM (#12207839)
    If it was developed it can be reversed engineered. Sorry to say but spam is here to stay unless of course someday the internet becomes regulated somehow.
    • maybe,just like radio after 1929 depression was regulated, perhaps after the next depression, they nut cases running the world will get real real paranoid (their drugs must be running out) and then regulate everything that makes them slightly insane.

      They can try, but they will have a tough battle on their hands, and I bet the hackers will rather fight to the death/100% shutdown of the net just to piss em off and send the companies bankcrupt.
      The IT techys working for the 'govt' wont be as motivated by desir
    • If it was developed it can be reversed engineered. Sorry to say but spam is here to stay unless of course someday the internet becomes regulated somehow.

      That's silly. Encryption has been thoroughly reverse-engineered, but it's still completely secure. The Linux source code is widely available, but it is still considered relatively secure.

      There are several competing proposals for how e-mail could be re-done to take into account the possibility of spamming. Preventing false headers and return addresses
  • by erick99 ( 743982 ) <homerun@gmail.com> on Monday April 11, 2005 @08:58PM (#12207862)
    Spam is like porn: hard to define but you know what it when you see it. That can be hard to program I would think. But, who knows.
    • Spam is like porn: hard to define but you know what it when you see it. That can be hard to program I would think. But, who knows.

      No, Spam is easy to define, it is any unwanted emails. Name elements that make spam:

      1) It is a form of communication
      2) The communication is unwanted
      3) The source of the communication is hidden
      4) In recieving the communication, you use your bandwith or incur a cost

      • by ch-chuck ( 9622 ) on Monday April 11, 2005 @09:12PM (#12207969) Homepage
        1) It is a form of communication

        all email is communication

        2) The communication is unwanted

        "wanted" is a subjective property of the recipient - the computer has no programmable decision procedure for wantedness.

        3) The source of the communication is hidden

        There may be some system of authenticating sender ID, and will be as easy as getting ppl to use pk encryption.

        4) In recieving the communication, you use your bandwith or incur a cost

        again a property of all emaiil.

        • It is not 4 different definitions of spam, it is 4 elements that make up a definition. If any 1 element is missing, then it would not be considered spam for purposes of law enforcement. What I am getting at is what are all the elements that make up spam, so that if any 1 element is missing, it is not a criminal offense.

          Spam is a form of communication. You can't have spam without some kind of communication (I would like a definition that inclused the telemarketers and all the shit I get in the mail). That

        • There may be some system of authenticating sender ID, and will be as easy as getting ppl to use pk encryption.

          You over-estimate the average computer user.

          If you're thinking of a transparent solution then you under-estimate how fragmented the e-mail client "industry" is.
        • You can define your willigness to receive a message in programmable terms, thus spam could be:

          -Messages whose senders I have not authorized to send me messages (that authorization could take the form of signed emails, white lists, etc)

      • I generally consider spam to be unsolicited commercial email:

        1) I didn't ask for it (unsolicited)
        2) it's advertising a product or service (commercial)
        3) I'll assume you know this one ;-)

        The problem with your definition, as others have pointed out, is that both 1) and 4) is true of *all* email, and so are redundant. 2) is highly subjective, and may apply to genuine email too - is a mail from say a slashdot user calling me nasty names for something I said here unwanted? Yes. Is it spam? Not by most people's
    • Our offsite spam engine can detect porn by looking at shapes, colours, etc...
      It works surprisingly well most of the time, though it did once pick up a photo of a broken PCB as porn due to its detected "posture"
    • Spam is like porn: hard to define but you know what it when you see it.

      Actually, spam is bulk email deliberately sent to recipients who did not request the mail.
  • by DumbSwede ( 521261 ) <slashdotbin@hotmail.com> on Monday April 11, 2005 @08:59PM (#12207874) Homepage Journal
    Just today I saw a new method in a ebay.com phishing scheme.

    The ebay.com link showed up at the bottom of the browser, but was replaced with some kind of javascript mouseon event. This is probably not new.

    Instead of random text to fool Bayesian filters, it had hidden recent news article summaries (bracketed by html comment tags) that would be similar to what you might post to a friend.

    Spam filters will probably be upgraded to catch this soon, but it was the first time I had seen it. And of course as mentioned in the article, the ebay specifics where obfuscated by html tags between letters.

    • Yeah, that won't cause much problem for bayesian. Essentially your filter will learn that news goes from good to neutral, and that javascript mouseovers go from bad to terrible.

      However, this isn't what Joshua and the rest of MS are working on. His stuff is much more in the area of modifying SMTP so that untrusted clients have to perform some calculations before their email is accepted, or pay a few cents. My guess is it will fail since it doesn't account for zombie PCs but I'm sure he has something plan
    • I don't think you should be running Javascript in e-mail. I disable it. I even disable automatic loading of images so that someone can't automatically confirm my address if I view a message.
  • to stop spam, (Score:5, Insightful)

    by havaloc ( 50551 ) * on Monday April 11, 2005 @09:02PM (#12207897) Homepage
    give spammers a 9 year prison sentence [slashdot.org].
  • by Anonymous Coward on Monday April 11, 2005 @09:03PM (#12207911)
    That'd probably be the best thing M$ could do to help reduce spam.
    • How does a secure OS prevent a user from installing a trojan that turns their machine into a spam zombie as well as telling them the weather forecast, or giving them a "cool" mouse cursor, or whatever?

      If I can install and run software, and that software can make network connections, my machine can be zombified, and there's nothing that MS can do about it (or any Linux distro, or anyone else producing an OS).

      Eliminate all the remote and local exploits, and you'll still be left with one - the user.
  • So, does Microsoft Research plan on combating Spam with a Bob-like approach, or the more refined Clippy approach?

    Or are they going to come up with an entirely new file system to combat it, hype it up for every Windows release, but then delay its release a few more years?

    Oops, pardon me while I reminisce about all the great advances Microsoft Research has given me :).
    • Each year they will announce that This is the Year of No More Spam on the Desktop (of course this never happens).

      Or they will invent a brilliant new way to stop spam but as it requires the user to recompile all their OS and apps every 3 days it never gets used.

      Or they just tell the end users "Why dont YOU code some anti-spam software?"

      Or they produce an anti-spam system but the user must install 3 desktops and window managers, requires a 10,000 line config file that must be written by hand, comes with ei
      • Ballmer keeps running around insisting everyone call it GNUNWNTD/Spam.

        Their Anti Spam system will reduce your Total Cost of E-mail. Or raise it a little. Or possibly keep it the same.

        Their (K)illSpam system will be promply forked into five apps known as GnoMoreSpam, OpenSP, Xithergy, WAGIJIG, and Betty.

        It will accept, filter, parse, and sort incoming mail, while additionally precaching any keywords you are likely to look up on google, but it will take twenty minutes to open an ASCII based letter.
        It wil
    • MS research comes up with some very cool stuff. Virtually none of it makes its way into Microsoft's products, I have no idea why :(
      • Actually a great deal of it makes it into our products however it usually doesn't filter down for nearly 10 years. The database technology for example that was researched 10 years ago is now appearing in the next release of SQL Server 2005.
    • or the more refined Clippy approach?


      Clippy "It looks like you are trying to send a lot of e-mail. Would you like to send the first one to 1lol56@aol.com?"

      Clippy "It looks like your return address is incorrect. Would you like me to fix it?

  • by datafr0g ( 831498 ) <datafrog@nOSpam.gmail.com> on Monday April 11, 2005 @09:09PM (#12207949) Homepage
    Don't you mean, Microsoft Mergers & Acquisitions?
  • by Rick Zeman ( 15628 ) on Monday April 11, 2005 @09:16PM (#12207990)
    Also, Spam-Research is reporting the SplitFit Technique that Spammers are using to fool Yahoo! Mail SpamGuard."

    How much credence should we put into an analysis from a guy who goes to the spammer's web site to unsubscribe?
  • by VeryProfessional ( 805174 ) on Monday April 11, 2005 @09:16PM (#12207996)

    I thought the name David Hackerman was a bit too good to be true, and it turns out it was. Following the link shows that his name is David Heckerman [microsoft.com]. Note to /. eds: please proofread your posts. It's not like they're very long...

  • "SplitFit" (Score:3, Interesting)

    by 1000101 ( 584896 ) on Monday April 11, 2005 @09:21PM (#12208027)
    From the SplitFit [spam-research.com] link...

    Dera Blcraays Mbmeer, Thsi eamil was stne by the Barclays serevr to vreify yuor emial adsserd. You mtsu competel thsi pssecor by ccilking on the likn bewol and entireng in the smlal wiodnw yoru Braclays Membership nrebmu, passcedo and meelbarom word. Tsih is doen for yruo proteoitcn - buacese semo of our mrebmes no lonegr haev assecc to theri emlia adserdses and we muts virefy it. To vyfire yruo eiaml arddess and accses yruo bnak anuocct , cilck on the lnik bolew:"

    That email is extremely difficult to filter out because the only 'real' words are no, of, our, and, etc. Simple words that occur so many times in legitimate emails that most spam filters practically ignore them. But I have to wonder.. who would actualy 'cilck on the lnik bolew' anyway? I hate to use the term 'you get what you deserve', but if you are naive enough to click the link, then the problem isn't your spam filter, it's you.

    • Re:"SplitFit" (Score:4, Insightful)

      You haven't RTFA it seems.

      That garbled text is ungarbled by certain software (i.e. outlook). That's because there are invisible chars in there that activate the "right to left" mode.

      Example:
      De*ra* B*lcra*ays M*bme*er
      translates to:
      Dear Barclays Member

      (I tried to copy the text I got in Yahoo, and paste it in MSN messenger. Amazingly, the text was "ungarbled". That's when I realized how tricky spammers were)

      SPAM software could simply detect left-to-right characters in such text, and ipso-facto label it as spam. Unless of course, you're reading hebrew. Which is obviously NOT the case.
  • by shanen ( 462549 ) on Monday April 11, 2005 @09:31PM (#12208101) Homepage Journal
    SMTP is working exactly as designed--but the design is broken. You can't fix a fundamentally economic problem with any number of technical tools. It's like adding more epicycles to the earth-centered "perfect spheres" models of the universe.

    The article barely mentions economics, and only in terms of the real costs of email--which only shows how much room there is for a real economic model with real business, real email, and *NO* spam.

    I really wish one of the major email players would offer an option for prepaid email. That would be an absolutely spam-proof system. It doesn't matter if the postage is two cents, the spammers can't afford it. Two cents against 50,000,000 spams turns out to be *REAL* money. Any email via that address would be at least some kind of real thing.

    • SMTP is working exactly as designed--but the design is broken.

      I would agree with this, to an extent. If I can finger anything that is obviously broken in SMTP it is that it lacks verification of the sender domain (sender).

      That makes it way too easy to fake out who sent the email in the first place, hence phishing and faked email headers. It also makes it impossible to use reputation as a spam fighting tool- the spam currently appears to be coming from all over the place, whereas in reality only a tiny,

    • It doesn't matter if the postage is two cents, the spammers can't afford it. Two cents against 50,000,000 spams turns out to be *REAL* money.

      This doesn't seem to be a major issue for telemarketers and direct-mail advertisers. All it will do is make the spam glossier.

      Well, that's not entirely true. It will reduce the volume of spam. But in the process it will reduce the volume of email, period. It's a bit like fighting crime by decimating the population.
      • I really can't understand why it is so difficult to get this point accross:

        Email is *NOT* free. It has *NEVER* been free. SMTP email pretends to be free and there is no pretense of accounting. That is because the original design of SMTP was the fantasy of fairness and equality and all that wonderful stuff. As long as both of us send and receive about the same amount of email, we can cancel things out without worrying about the exact accounting.

        In reality, there are costs of email, and they are simply di

  • my solution (Score:3, Interesting)

    by ricochet81 ( 707864 ) on Monday April 11, 2005 @09:35PM (#12208117)
    Here's my solution to the greater unwanted communication Anti-spam paper [adaptx.com] submitted to Conference on Email and Anti-Spam [www.ceas.cc]
  • by Animats ( 122034 ) on Monday April 11, 2005 @09:38PM (#12208142) Homepage
    A spammer needs certain resources to survive. Most spam control effort focus on cutting off the spammer's ability to send spam. Much has been done in that direction. Now more effort needs to be applied to the other direction - cutting off the spammer's payment stream.

    Legally, this is promising. First, there's no free speech issue. Second, in most jurisdictions, it's illegal to operate an anonymous business. So most spammers are criminals. Third, laundering transactions through intermediaries is usually a crime, too.

    The problem for law enforcement is that following the money is difficult. Additional technical support for that would be a big help.

    A good starting point would be to get a credit card issuing bank to cooperate in a scheme where, when one of their credit cards is used, full transaction details, including the payee's full identity, are immediately returned to the cardholder, using encrypted E-mail or some other secure means. That would make "following the money" much easier. This only requires one cooperating bank. That bank's credit cards might become popular with heavy Internet users. Especially if this works for prepaid credit cards, so you can find out who's behind a web site by using some disposable credit card.

    The next step is to crack down on "credit card intermediaries". Non-bank credit card intermediaries that handle spammer transactions should be stuck with the legal liability of the spammer. Legally, they're the "merchant". They shouldn't be allowed to pass the buck to some other party. This will make "cheap merchant accounts" harder to get, which is probably a good thing.

  • Hidden Markov Model (Score:3, Interesting)

    by icejai ( 214906 ) on Monday April 11, 2005 @09:43PM (#12208176)
    Why not use a hidden markov model to filter spam that use random digits as filler?

    A very basic filter will work this way:

    Train a network of say, 30 to 40 units, with any english text. The training text doesn't just have to be limited to letters and numbers, it can include other ascii characters as well, because the hidden markov model will create distributions for them as well.

    Now, for each new email that comes in, grab random chunks of text (maybe random 30-character strings) and see how probable the text would be in this hidden markov model. If it turns out not very likely, then scrap it.

    Any thoughts?

    • by mcc ( 14761 ) <amcclure@purdue.edu> on Monday April 11, 2005 @10:12PM (#12208338) Homepage
      Right now spam is usually filtered using a brownian model. As a result, spammers have begun structuring their emails so as to target brownian models. How many spams have you gotten lately with the subject line ending in confiscate ok wallop yls oblivion?

      If we move to filtering spam using markov models, spammers will begin structuring their emails so as to target markov models. Look forward to all your spams ending in 500-word blocks of text from a copy of MegaHAL trained on old grandmothers' email boxes.
      • Look forward to all your spams ending in 500-word blocks of text from a copy of MegaHAL trained on old grandmothers' email boxes

        Some spam is already including the text of random news articles to defeat filters...
  • by Gary Destruction ( 683101 ) * on Monday April 11, 2005 @09:52PM (#12208224) Journal
    Is there really such a thing as a solution to spam? For every new technique that is developed, the spammers will find a way to circumvent it. Spam is a multi-million dollar business. I'd go so far as to say that it's a science. At least, the spammers seem to have it down to a science.

    Trying to find a solution to spam is an idea in the eyes of experts and analysts. But to spammers, it's a road block that they must work around to stay in business.
    Spamming techniques will no doubt end up as signatures in spam filters that are not unlike those signatures used by IDS and virus scanners. The experts don't seem to understand that if there's a will, there's a way. And the spam will just keep coming in another form or by some other technique. All that can be done is to keep up with changing techniques and patterns and treat spam for what is truly is -- an attack vector.
    • I agree. Stopping spam via code can and will be circumvented.

      We need to stop spam legally and politically. Instead of targeting the spammers, target the companies that utilise it. If it were illegal to advertise your product via spam, there would be no market for it. It cant really be circumvented - somehow there has to be a link to the product/company in the email.
  • by Fox_1 ( 128616 ) on Monday April 11, 2005 @09:59PM (#12208263)
    Well they weren't really a spam company, they sold software that allowed you to generate spam messages. I was going to do some telephone sales for them, cold call their market (I know, it's evil but I was calling corporations, not individuals, and I needed some cash) but after I got a copy of their software and became familiar with it's capabilities I felt icky, like I stepped in something, I couldn't in good conscience work for them. It had been presented to me as a customer contact software package - but it had too many little sneaky features that marked it to me as spam software, (built in SMTP server, throttle control on smtp activity so your ISP didn't get mad at you, and a bunch of message generation/tracking options) or at least there was nothing stopping customers from using it in that way, no matter how the company described their product.
  • I see all this time and money being invested into research to block spam. But we need to rethink our premises: does spam even need to be blocked? Is it actually a problem?

    What you call "spam", I call "emails that help me learn about the latest products, websites, and business models". You want less of it? I want MORE of it. "Spam" keeps me informed about the world. And the fact is, consumers LIKE spam. Why do you think spam is profitable? Because people buy the products advertised! Studies show that 3 in
    • Just to feed the troll some more....

      How is it acceptable to be sitting at work and recieve pornographic SPAM selling sex sites, penis enlargement and viagra?

      You might want junk mail, but it seems that the majority of us dont.
    • I shouldn't feed the troll, but....

      Studies show that 3 in 5 people who dislike "spam" have actually bought something online.

      I fit right into that catagory. I get lots of spam. I have bought something online. Is this related somehow? I dislike SPAM. I bought a game. The SPAM and the game purchase are unrelated. I have bought nothing from any e-mail I have ever received.

      I did a search to see who had the best deal on the game I wanted. I did not search my e-mail account for the information.

      Please
  • Doesn't Outlook let you filter for only messages that come from addresses in your contact list? That cuts down on most spam - even spoofed address spams don't usually target people in the address owner's contact list. They just harvest addresses from phishing or web pages, which doesn't access contact lists.

    The viral spams, where a virus reads a contact list and sends itself to the contacts from a familiar (and actual) address, are vulnerable to a server-based strategy. Servers could detect identical (or n
  • Spam still an issue? (Score:3, Informative)

    by groomed ( 202061 ) on Monday April 11, 2005 @11:03PM (#12208681)
    Between SpamAssassin, procmail, and MUA filtering rules, I rarely get to see spam anymore. The spam which does slip through is so absurd and surreal that it's more hilarious than annoying.

    If everybody did this, the volume of spam would quickly dry up. Because when people don't see the spam, they can't respond to it, and when they don't respond to it, the spammer doesn't have a business.

    Educate the people around you and help them reduce the spam that gets to their inbox. Don't support solutions which effectively render nodes at the network periphery to second-class status.
  • Commitment (Score:3, Interesting)

    by Deliveranc3 ( 629997 ) <deliverance.level4@org> on Monday April 11, 2005 @11:11PM (#12208725) Journal
    They blocked the block function for microsoft messages in hotmail.
  • I just got out of a six-hour meeting, so I'm a bit senseless. But I see no one has posted this yet, so:

    Your post advocates a

    (X) technical ( ) legislative ( ) market-based ( ) vigilante

    approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

    ( ) Spammers can easily use it to harvest email addresses
    ( ) Mailing lists
  • by mamladm ( 867366 ) on Tuesday April 12, 2005 @02:59AM (#12209805) Homepage
    The overwhelming majority of spam filter deceiving techniques relies on HTML. If you block messages containing HTML on the mail server, the spam that gets through is near 100% identifiable as spam using bayesian filters.

    So why on earth do people still use HTML in their email? Email should be plain text only anyway.
  • > spammers of splitting words using HTML comments, pairs of zero width tags, or bogus tags.

    Personally I'd like to be able to specify that I simply will not receive HTML mail. If someone does send it to me then I'd like my mail server (even better my ISPs mail server) to automatically return a "this recipient does not wish to receive HTML formatted email, please either resend as plain text or don't bother" reply (or is it that this is already possible and I'm just too lazy to work out how to do it ? :)
  • I liked the phishing email from the 'split fit' article:

    Date: Mon, 28 Mar 2005 18:41:22 -0800 (PST)
    From: "B&#1983;rclay&#1980;s" <amwmwngkev@yahoo.com>
    Subject: Braclays Emlia&#8236; Veacifition
    MIME-Version: 1.0
    Content-Type: multipart/alternative; boundary="0-1893089315-1112064082=:45059"
    Content -Length: 1338

    Dera Blcraays Mbmeer,

    Thsi eamil was stne by the Barclays serevr to vreify yuor emial adsserd. You mtsu competel thsi pssecor
    by ccilking
    on the likn bewol and entireng in the sml

There are two ways to write error-free programs; only the third one works.

Working...