Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Microsoft Researchers on Stopping Spam

Posted by timothy on Mon Apr 11, 2005 08:49 PM
from the slow-treacle dept.
TheBackBencher writes "Scientific American today has a very interesting article about "Stopping Spam" by Joshua Goodman, David Hackerman and Robert Rounthwaite from Microsoft Research. They talk about different types of spam -- spam with emails, spam on IMs, spamlinks on web pages and image based spam. They mention different techniques for spam filtering mainly fingerprinting matching techniques, n grams model, naive bayesian approach, optical character recognition, challenge/response systems and Human Interacted Proofs (HIP) in a very lucid style. They however do not mention fingerprinting approach of using Nilsimsa Hash to tackle addition of random words by spammers in emails or hypertextus interruptus technique used by spammers of splitting words using HTML comments, pairs of zero width tags, or bogus tags. Also, Spam-Research is reporting the SplitFit Technique that Spammers are using to fool Yahoo! Mail SpamGuard."
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by Anonymous Coward on Monday April 11 2005, @08:51PM (#12207797)
    Creating your own spamming division, use illegal tactics to undercut your spamming competition, put them out of business, then stop spamming.
    • by vwjeff (709903) on Monday April 11 2005, @09:24PM (#12208049)
      They talk about different types of spam -- spam with emails, spam on IMs, spamlinks on web pages and image based spam.

      Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam; spam bacon sausage and spam; spam egg spam spam bacon and spam; spam sausage spam spam bacon spam tomato and spam;

      (punches self in face face hits keyboardadsfjlk;sjdafkldsajflsdak;fjsad;lfkjas;ldk fjas)
  • I don't know... (Score:4, Insightful)

    by Anonymous Coward on Monday April 11 2005, @08:55PM (#12207839)
    If it was developed it can be reversed engineered. Sorry to say but spam is here to stay unless of course someday the internet becomes regulated somehow.
  • Spam is like porn: hard to define but you know what it when you see it. That can be hard to program I would think. But, who knows.
    • Spam is like porn: hard to define but you know what it when you see it. That can be hard to program I would think. But, who knows.

      No, Spam is easy to define, it is any unwanted emails. Name elements that make spam:

      1) It is a form of communication
      2) The communication is unwanted
      3) The source of the communication is hidden
      4) In recieving the communication, you use your bandwith or incur a cost

      • by ch-chuck (9622) on Monday April 11 2005, @09:12PM (#12207969) Homepage
        1) It is a form of communication

        all email is communication

        2) The communication is unwanted

        "wanted" is a subjective property of the recipient - the computer has no programmable decision procedure for wantedness.

        3) The source of the communication is hidden

        There may be some system of authenticating sender ID, and will be as easy as getting ppl to use pk encryption.

        4) In recieving the communication, you use your bandwith or incur a cost

        again a property of all emaiil.

        • It is not 4 different definitions of spam, it is 4 elements that make up a definition. If any 1 element is missing, then it would not be considered spam for purposes of law enforcement. What I am getting at is what are all the elements that make up spam, so that if any 1 element is missing, it is not a criminal offense.

          Spam is a form of communication. You can't have spam without some kind of communication (I would like a definition that inclused the telemarketers and all the shit I get in the mail). That

    • Our offsite spam engine can detect porn by looking at shapes, colours, etc...
      It works surprisingly well most of the time, though it did once pick up a photo of a broken PCB as porn due to its detected "posture"
  • by DumbSwede (521261) <slashdotbin@hotmail.com> on Monday April 11 2005, @08:59PM (#12207874) Homepage Journal
    Just today I saw a new method in a ebay.com phishing scheme.

    The ebay.com link showed up at the bottom of the browser, but was replaced with some kind of javascript mouseon event. This is probably not new.

    Instead of random text to fool Bayesian filters, it had hidden recent news article summaries (bracketed by html comment tags) that would be similar to what you might post to a friend.

    Spam filters will probably be upgraded to catch this soon, but it was the first time I had seen it. And of course as mentioned in the article, the ebay specifics where obfuscated by html tags between letters.

    • Yeah, that won't cause much problem for bayesian. Essentially your filter will learn that news goes from good to neutral, and that javascript mouseovers go from bad to terrible.

      However, this isn't what Joshua and the rest of MS are working on. His stuff is much more in the area of modifying SMTP so that untrusted clients have to perform some calculations before their email is accepted, or pay a few cents. My guess is it will fail since it doesn't account for zombie PCs but I'm sure he has something plan
    • I don't think you should be running Javascript in e-mail. I disable it. I even disable automatic loading of images so that someone can't automatically confirm my address if I view a message.
  • to stop spam, (Score:5, Insightful)

    by havaloc (50551) * on Monday April 11 2005, @09:02PM (#12207897) Homepage
    give spammers a 9 year prison sentence [slashdot.org].
  • by Anonymous Coward on Monday April 11 2005, @09:03PM (#12207911)
    That'd probably be the best thing M$ could do to help reduce spam.
  • So, does Microsoft Research plan on combating Spam with a Bob-like approach, or the more refined Clippy approach?

    Or are they going to come up with an entirely new file system to combat it, hype it up for every Windows release, but then delay its release a few more years?

    Oops, pardon me while I reminisce about all the great advances Microsoft Research has given me :).
    • Each year they will announce that This is the Year of No More Spam on the Desktop (of course this never happens).

      Or they will invent a brilliant new way to stop spam but as it requires the user to recompile all their OS and apps every 3 days it never gets used.

      Or they just tell the end users "Why dont YOU code some anti-spam software?"

      Or they produce an anti-spam system but the user must install 3 desktops and window managers, requires a 10,000 line config file that must be written by hand, comes with ei
    • or the more refined Clippy approach?


      Clippy "It looks like you are trying to send a lot of e-mail. Would you like to send the first one to 1lol56@aol.com?"

      Clippy "It looks like your return address is incorrect. Would you like me to fix it?

  • by datafr0g (831498) <datafrog AT gmail DOT com> on Monday April 11 2005, @09:09PM (#12207949) Homepage
    Don't you mean, Microsoft Mergers & Acquisitions?
  • by Rick Zeman (15628) on Monday April 11 2005, @09:16PM (#12207990)
    Also, Spam-Research is reporting the SplitFit Technique that Spammers are using to fool Yahoo! Mail SpamGuard."

    How much credence should we put into an analysis from a guy who goes to the spammer's web site to unsubscribe?
  • by VeryProfessional (805174) on Monday April 11 2005, @09:16PM (#12207996)

    I thought the name David Hackerman was a bit too good to be true, and it turns out it was. Following the link shows that his name is David Heckerman [microsoft.com]. Note to /. eds: please proofread your posts. It's not like they're very long...

  • "SplitFit" (Score:3, Interesting)

    by 1000101 (584896) on Monday April 11 2005, @09:21PM (#12208027)
    From the SplitFit [spam-research.com] link...

    Dera Blcraays Mbmeer, Thsi eamil was stne by the Barclays serevr to vreify yuor emial adsserd. You mtsu competel thsi pssecor by ccilking on the likn bewol and entireng in the smlal wiodnw yoru Braclays Membership nrebmu, passcedo and meelbarom word. Tsih is doen for yruo proteoitcn - buacese semo of our mrebmes no lonegr haev assecc to theri emlia adserdses and we muts virefy it. To vyfire yruo eiaml arddess and accses yruo bnak anuocct , cilck on the lnik bolew:"

    That email is extremely difficult to filter out because the only 'real' words are no, of, our, and, etc. Simple words that occur so many times in legitimate emails that most spam filters practically ignore them. But I have to wonder.. who would actualy 'cilck on the lnik bolew' anyway? I hate to use the term 'you get what you deserve', but if you are naive enough to click the link, then the problem isn't your spam filter, it's you.

    • Re:"SplitFit" (Score:4, Insightful)

      You haven't RTFA it seems.

      That garbled text is ungarbled by certain software (i.e. outlook). That's because there are invisible chars in there that activate the "right to left" mode.

      Example:
      De*ra* B*lcra*ays M*bme*er
      translates to:
      Dear Barclays Member

      (I tried to copy the text I got in Yahoo, and paste it in MSN messenger. Amazingly, the text was "ungarbled". That's when I realized how tricky spammers were)

      SPAM software could simply detect left-to-right characters in such text, and ipso-facto label it as spam. Unless of course, you're reading hebrew. Which is obviously NOT the case.
  • by shanen (462549) on Monday April 11 2005, @09:31PM (#12208101) Homepage Journal
    SMTP is working exactly as designed--but the design is broken. You can't fix a fundamentally economic problem with any number of technical tools. It's like adding more epicycles to the earth-centered "perfect spheres" models of the universe.

    The article barely mentions economics, and only in terms of the real costs of email--which only shows how much room there is for a real economic model with real business, real email, and *NO* spam.

    I really wish one of the major email players would offer an option for prepaid email. That would be an absolutely spam-proof system. It doesn't matter if the postage is two cents, the spammers can't afford it. Two cents against 50,000,000 spams turns out to be *REAL* money. Any email via that address would be at least some kind of real thing.

  • my solution (Score:3, Interesting)

    by ricochet81 (707864) on Monday April 11 2005, @09:35PM (#12208117)
    Here's my solution to the greater unwanted communication Anti-spam paper [adaptx.com] submitted to Conference on Email and Anti-Spam [www.ceas.cc]
  • by Animats (122034) on Monday April 11 2005, @09:38PM (#12208142) Homepage
    A spammer needs certain resources to survive. Most spam control effort focus on cutting off the spammer's ability to send spam. Much has been done in that direction. Now more effort needs to be applied to the other direction - cutting off the spammer's payment stream.

    Legally, this is promising. First, there's no free speech issue. Second, in most jurisdictions, it's illegal to operate an anonymous business. So most spammers are criminals. Third, laundering transactions through intermediaries is usually a crime, too.

    The problem for law enforcement is that following the money is difficult. Additional technical support for that would be a big help.

    A good starting point would be to get a credit card issuing bank to cooperate in a scheme where, when one of their credit cards is used, full transaction details, including the payee's full identity, are immediately returned to the cardholder, using encrypted E-mail or some other secure means. That would make "following the money" much easier. This only requires one cooperating bank. That bank's credit cards might become popular with heavy Internet users. Especially if this works for prepaid credit cards, so you can find out who's behind a web site by using some disposable credit card.

    The next step is to crack down on "credit card intermediaries". Non-bank credit card intermediaries that handle spammer transactions should be stuck with the legal liability of the spammer. Legally, they're the "merchant". They shouldn't be allowed to pass the buck to some other party. This will make "cheap merchant accounts" harder to get, which is probably a good thing.

    • by killjoe (766577) on Monday April 11 2005, @10:22PM (#12208381)
      In the past the FBI has already caught people by simply buying what they were selling and then finding the person who cached the check.

      Of course the FBI can't arrest people in the lawless places of the world like croatia and hungry so those government will need to shed their corruption.

      In other words I don't think your scheme will work because so much of the world is out of the reach of law enforcement.
  • Hidden Markov Model (Score:3, Interesting)

    by icejai (214906) on Monday April 11 2005, @09:43PM (#12208176)
    Why not use a hidden markov model to filter spam that use random digits as filler?

    A very basic filter will work this way:

    Train a network of say, 30 to 40 units, with any english text. The training text doesn't just have to be limited to letters and numbers, it can include other ascii characters as well, because the hidden markov model will create distributions for them as well.

    Now, for each new email that comes in, grab random chunks of text (maybe random 30-character strings) and see how probable the text would be in this hidden markov model. If it turns out not very likely, then scrap it.

    Any thoughts?

    • by mcc (14761) <amcclure@purdue.edu> on Monday April 11 2005, @10:12PM (#12208338) Homepage
      Right now spam is usually filtered using a brownian model. As a result, spammers have begun structuring their emails so as to target brownian models. How many spams have you gotten lately with the subject line ending in confiscate ok wallop yls oblivion?

      If we move to filtering spam using markov models, spammers will begin structuring their emails so as to target markov models. Look forward to all your spams ending in 500-word blocks of text from a copy of MegaHAL trained on old grandmothers' email boxes.
  • by Gary Destruction (683101) * on Monday April 11 2005, @09:52PM (#12208224) Journal
    Is there really such a thing as a solution to spam? For every new technique that is developed, the spammers will find a way to circumvent it. Spam is a multi-million dollar business. I'd go so far as to say that it's a science. At least, the spammers seem to have it down to a science.

    Trying to find a solution to spam is an idea in the eyes of experts and analysts. But to spammers, it's a road block that they must work around to stay in business.
    Spamming techniques will no doubt end up as signatures in spam filters that are not unlike those signatures used by IDS and virus scanners. The experts don't seem to understand that if there's a will, there's a way. And the spam will just keep coming in another form or by some other technique. All that can be done is to keep up with changing techniques and patterns and treat spam for what is truly is -- an attack vector.
  • by Fox_1 (128616) on Monday April 11 2005, @09:59PM (#12208263) Homepage
    Well they weren't really a spam company, they sold software that allowed you to generate spam messages. I was going to do some telephone sales for them, cold call their market (I know, it's evil but I was calling corporations, not individuals, and I needed some cash) but after I got a copy of their software and became familiar with it's capabilities I felt icky, like I stepped in something, I couldn't in good conscience work for them. It had been presented to me as a customer contact software package - but it had too many little sneaky features that marked it to me as spam software, (built in SMTP server, throttle control on smtp activity so your ISP didn't get mad at you, and a bunch of message generation/tracking options) or at least there was nothing stopping customers from using it in that way, no matter how the company described their product.
  • Spam still an issue? (Score:3, Informative)

    by groomed (202061) on Monday April 11 2005, @11:03PM (#12208681)
    Between SpamAssassin, procmail, and MUA filtering rules, I rarely get to see spam anymore. The spam which does slip through is so absurd and surreal that it's more hilarious than annoying.

    If everybody did this, the volume of spam would quickly dry up. Because when people don't see the spam, they can't respond to it, and when they don't respond to it, the spammer doesn't have a business.

    Educate the people around you and help them reduce the spam that gets to their inbox. Don't support solutions which effectively render nodes at the network periphery to second-class status.
  • Commitment (Score:3, Interesting)

    by Deliveranc3 (629997) on Monday April 11 2005, @11:11PM (#12208725) Journal
    They blocked the block function for microsoft messages in hotmail.
  • by mamladm (867366) on Tuesday April 12 2005, @02:59AM (#12209805) Homepage
    The overwhelming majority of spam filter deceiving techniques relies on HTML. If you block messages containing HTML on the mail server, the spam that gets through is near 100% identifiable as spam using bayesian filters.

    So why on earth do people still use HTML in their email? Email should be plain text only anyway.
    • by Sonar (70854) on Monday April 11 2005, @09:01PM (#12207894)
      Of course, one 200MB update from Microsoft would kill this idea. Or how about a 500MB game demo download? Thats legitimately free. Or better yet, what if I need to download a linux distro or a television episode?

      I would hate to have to explain all my actions to my ISP. Espically with the way media is driving the internet nowadays. 200MB is way too small of a limit.

      Now, you can monitor how many e-mails are sent by a host. That would be a better way. At least there could be a filter on the "to:" line. If that list includes over say, 1000+ users, consistantly, then at least there could be some flags raised.
      • I would hate to have to explain all my actions to my ISP. Espically with the way media is driving the internet nowadays. 200MB is way too small of a limit.

        Now, you can monitor how many e-mails are sent by a host. That would be a better way. At least there could be a filter on the "to:" line. If that list includes over say, 1000+ users, consistantly, then at least there could be some flags raised.

        That is a good point. With my daily bandwith threshold test, I was thinking that if someone is uploading

    • Ugh (Score:4, Insightful)

      by Mr. Underbridge (666784) on Monday April 11 2005, @09:46PM (#12208199)
      Legislate against spam. As long as spam is legal, or the penalties against it are too low, or it is too easy to do, people will continue to try and make a quick buck.

      First, I guess you didn't see the guy in VA who just got something like 9 years in jail.

      That said, spam doesn't obey jurisdictional boundaries. Any single country can only solve a small part of the problem, and any spam incident often involves over 3 jusrisdictions that may be in separate countries (sender, spambot, recipient, etc). That's a logistical nightmare that isn't soluble outside of a dream world.

      Also, force all ISP's to monitor how much bandwith a source has. If you get too much usage per day, say 200 megabytes or more, then that person has to explain why they need that much bandwith. If someone gets the RIAA on board, with their lobbyists, that should pass very quickly.

      That's fantastic. Trade a bad problem for one that's much worse. Get the RIAA to legitimize their practices by using a guise of stopping spam? Let's not.

      Also, force all email to have some element which identifies the source. Not just a header that can be forged, but something that can't be hacked.

      Now by force, what do you do if they don't? Enforement issues again here.

      Ultimately, legislative solutions for spam DO NOT and CAN NOT work for much but a small part of the problem. It's satistfying when some moron is clumsy enough to get caught (as with the guy in VA), but mostly these days the spammers aren't that stupid. Technological solutions work far better.

      • 1. Most civilized countries are sick and tired of SPAM too. E.g., most European countries. So there is enough scope for a spam free zone, if the USA does want to get its act together and cooperate. It's not like you're alone against the world on the SPAM issue, except for the fact that:

        2. It's mostly your spam that's dumped upon the rest of the world. USA is currently _the_ biggest source of spam, followed by... offshored operations paid for by someone from the USA.

        So on one hand, the USA could halve the
        • No, all routers in the USA can be forced to reject all email, unless it comes on a specific port, with specific identifiers. For example, maybe have a ISP program that you must instal on your machine that identifies your email. If the hash made by that program is wrong, they drop the email. Like what microsoft does when you try and instal software, you have to validate that you own the software and it is running on one machine.

          None of those "let's redefine the SMTP standard" crackpot schemes are going to w

    • Legislation ultimately runs into international borders and places where U.S. law cannot go. It can help, but honestly I am not sure how to craft a good law that will keep up with the pace of technology. Also, a law does not guarantee effective enforcement.

      A better strategy, IMO, is to work on the commercial level. It has been said here on /. many times that if there were no money for spammers, there will be no spam. When spam becomes an issue which decides where money goes (who wins and who looses), th
    • Legislate against spam. As long as spam is legal, or the penalties against it are too low, or it is too easy to do, people will continue to try and make a quick buck.

      I don't see that helping. Legislate in what jurisdiction? In which countries can it be enforced? Note that one can simply lease servers in a country immune to such legislation, or outsource to a company in such a country.

      Besides, FAX spam has been illegal for years, yet it continues to happen pretty constantly.

      Also, force all ISP's to mon
      • by John Seminal (698722) on Monday April 11 2005, @09:23PM (#12208047) Journal
        And SPAM is different from junk snail-mail how? (BTW, anyone have any idea as to why bulk E-mail postage costs less than regular snail mail postage?) The main difference is if I want to send you something through the mail, I have to put a stamp on it and pay money to ship it. If I want to spam you, I can write a virus and get 1000 machines to pump out the spam. I can do it so it does not cost me anything but my time.

        Plus, with the postal service, there are 1000's of laws in place. If I send you an offer through the mail designed to rip you off, that is a federal offense. You can't use the US Postal Service for illegal activities, if you do you get caught.

        Remember the movie The Firm? They did not convict the lawyers for tax evasion or any other crime. They convicted them for mail fraud. And if you let the worst spammers know that each and every time they send a message that is spam, each instance will incur a penalty, that might stop them.

    • Interesting idea, however invalid address responses are sent within 5 minutes of the original mail. If the response is sent over a day after the original mail is sent, the spammer could just discard it.

      If we respond instantly to all email with invalid address mails, it will be overused and spammers will ignore ALL of them. This is much like antibiotics, we use them too much, the bacteria accommodates for it, and antibiotics become obsolete.

      So far, bayesion filtering and/or whitelisting email are the
      • Re:I have an idea (Score:5, Insightful)

        by sfe_software (220870) * on Monday April 11 2005, @11:22PM (#12208781) Homepage
        Interesting idea, however invalid address responses are sent within 5 minutes of the original mail. If the response is sent over a day after the original mail is sent, the spammer could just discard it.

        The thing is, I don't belive spammers ever remove an address due to an error. I had a domain that received a ton of spam, and that domain expired. Two years later (fighting with Network Solutions) I got the domain back, and immediately started receiving a ton of spam. Two years of spammers sending spam to invalid addresses (no DNS on the domain) and they still continued.

        Why?

        Simple: the spammers don't receive bounce messages, and the spam-servers (which could be static servers, or compromised zombie machines) don't provide accurate return information. Much like how telemarketers often show invalid or "Unknown" caller-ID info. It costs nearly nothing to send a spam message to an address, whether that address is valid or not. It costs much more to weed out invalid or unreachable addresses from your list by intercepting bounce messages etc.

        And spammers don't give a shit. Most of the time, they are using someone else's machine (a zombie'd Windows box, or an open relay) so they don't need to care. So this trick simply doesn't work. It's cheaper to just continue sending to invalid addresses. Not to mention, many newbie spammers get their lists from less-than-legit sources who are selling large lists; they don't care (and are usually fully aware) that many of the addresses they are selling are bogus or no longer valid...

        In short, simple tricks like this don't work, when dealing with an "industry" that doesn't give a shit...
    • Re:I have an idea (Score:5, Insightful)

      by xQx (5744) on Monday April 11 2005, @09:15PM (#12207986)
      Here's a more interesting idea...

      Authenticate SMTP with public key signing. -- Then use a trust network to only accept email from trusted companies.

      Why it won't work:
      It involves effort and cost.

      Baah, the internet should be unregulated, if they can get rid of SPAM then whats to stop them getting rid of porn, anti-government information etc. There's a road we all want to go down.

      Don't buy it and Get over it(tm).
      • I assume you're being facetious. Why does stopping someone from sending you garbage equate to not being able to find something. That argument sounds like those nitwit "censorship" whiners who think Freedom of Speech means Freedom to be Heard and that all content must be available at all times in all places.

        Anarchy never worked in the real world, how could it work in the electronic world?

      • Maybe you didn't quite understand what I was talking about.

        This would be completely done server side. Just like when you sent an e-mail to a host, and you get returned mail because you somehow typed in the address improperly. There would be no difference between that message and one that was sent to a user and then flagged as spam. It would be impossible to tell the difference if the user was a valid address or not.

        Thats what I am getting at.
      • by John Seminal (698722) on Monday April 11 2005, @09:13PM (#12207976) Journal
        Also, if you reply, the spammer will know your address is active and send more crap.

        I don't undertsand this. On one hand, you have the police saying they can't track spammers. Spammers use drones, they remain hidden, they hide their tracks. On the other hand, if you unsubscribe, they know your email is a real one, and you get more spam. That tells me whoever runs the unsubscribe service is in cahoots with the spammer and is just as guilty. They have to know where to send their lists? Just track them as part of the war on spam.

        • Your unsubscribe is executed on a bot (a captured machine) which the bad guys can look at, the after taking precautions not to be observed, and harvest what they want from it. The good guys, if they capture the machine will just get your address (if it isnt encrypted by the bad guys) and a machine that is acting funny (if they dont know how to knock to get into the bot-ware) Since logging cannot be trusted on a compromised machine, what they need is a non-compromised machine beside the compromised one (on the same segment) to watch the traffic go in and out... a honeypot. That is a lot of hard work.
      • People don't send spam from their ISP's account.

        Very true. They use a botnet.

        They send it straight through their computer.

        Not they don't. It's the easy to be on a RBL.

        Now, you could put outbound filtering on port 25, and require everyone to send mail through the ISP's servers (with authenticated SMTP of some sort), though there will be some legitimate traffic surpressed if that happens...

        The botnet is used to send just a few e-mails from each bot. Get an unfiltered inbox. Check the multiple cop