Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Security The Internet IT

Researchers Create Highly Predictive Blacklists 71

Grablets writes "Using a link analysis algorithm similar to Google PageRank, researchers at the SANS Institute and SRI International have created a new Internet network defense service that rethinks the way network blacklists are formulated and distributed. The service, called Highly Predictive Blacklisting, exploits the relationships between networks that have been attacked by similar Internet sources as a means for predicting which attack sources are likely to attack which networks in the future. A free experimental version is currently available."
This discussion has been archived. No new comments can be posted.

Researchers Create Highly Predictive Blacklists

Comments Filter:
  • by khasim ( 1285 ) <brandioch.conner@gmail.com> on Wednesday July 23, 2008 @10:40PM (#24314149)

    They take X firewall logs ...

    Then they look for matches in attacking IP addresses between the logs ...

    And if any IP addresses appear in log A (which is very similar to log B) ... then those IP addresses are "predicted" as being possible to attack the firewall from which log B was obtained.

    Logical - yes.
    Predictive - no.

    • by twatter ( 867120 ) on Wednesday July 23, 2008 @10:47PM (#24314191)

      I agree, but the key here is to ensure that there are no false positives, which have been traditionally the biggest problem with blacklists.

      If they figure that out, I don't care what kind of statistical approach they are using, as long as it works.

      I think someone from MIT (maybe three or four years ago during the height of the problems with Spamhaus?) tried this before, but I don't remember if it got anywhere. Maybe this is an ofshoot from that.

      In the meantime... SpamAssasin with whitelists, which is the best of worse worlds.

      • Re: (Score:3, Insightful)

        by Mad Hughagi ( 193374 )

        It's pretty easy to get false positives depending on how you configure SpamAssasin.

    • by elnico ( 1290430 ) on Wednesday July 23, 2008 @11:25PM (#24314413)

      Logical - yes.
      Predictive - no.

      So if this isn't predictive, what is? Would you rather they develop an algorithm that identifies blacklist-worthy addresses before they make their first attack?

      The application of this algorithm actually seems pretty clever. It captures the fact that "true" attackers mostly attack "true" (that is, weak or high profile) targets, whereas those targets are mostly attacked by "true" attackers. Thus some isolated attack by a never-before-detected attacker on a never-before-attacked target has very little predictive potential in the eyes of the algorithm, whereas even just a few attacks by a never-before-seen attacker on several oft-attacked targets raises a huge red flag.

      • by Zadaz ( 950521 )

        So if this isn't predictive, what is?

        I don't know, but it isn't the service called Highly Predictive Blacklisting that the article is about.

        • by spazdor ( 902907 )

          What I want to know is whether this HPB protocol itself is vulnerable to attack. Can I spoof a few packets at a few common targets and get some sucker blacklisted by half the Internet? Can I run 30,000 virtualized HPB nodes and use them to stack the deck with maliciously generated logs?

      • Re: (Score:1, Redundant)

        by Joebert ( 946227 )

        So if this isn't predictive, what is?

        Failure.

        I like to play it safe and assume nothing works.

      • Not really. (Score:5, Interesting)

        by khasim ( 1285 ) <brandioch.conner@gmail.com> on Wednesday July 23, 2008 @11:40PM (#24314519)

        So if this isn't predictive, what is? Would you rather they develop an algorithm that identifies blacklist-worthy addresses before they make their first attack?

        Ummmm, yes. If you can identify them BEFORE they make their first attack then that would qualify as "predictive".

        It captures the fact that "true" attackers mostly attack "true" (that is, weak or high profile) targets, whereas those targets are mostly attacked by "true" attackers.

        Not in my experience. The attacks are usually automated scripts running on zombies that randomly scan address (or search their immediate networks) looking for known vulnerabilities.

        Thus some isolated attack by a never-before-detected attacker on a never-before-attacked target has very little predictive potential in the eyes of the algorithm, whereas even just a few attacks by a never-before-seen attacker on several oft-attacked targets raises a huge red flag.

        That is the opposite of how their system was described. They looked for matches amongst IP addresses and then "predicted" that if your example machine one firewall it should be blacklisted for the other firewalls that closely matched that list.

        Now a real predictive system would look more factors.

        #1. Who was attacking.

        #2. How did the attacker(s) gain access to the machines used in the attack.

        #3. What other machines are vulnerable to #2 that are available to #1.

        Example - Spam zombies often appear in ranges of home addresses from the large ISP's. So machines in those ranges are given an increased score in SpamAssassin. Whether they have ever sent spam before or not. See #1 and #2 and #3.

        • Re:Not really. (Score:5, Interesting)

          by mcrbids ( 148650 ) on Thursday July 24, 2008 @02:17AM (#24315321) Journal

          Ummmm, yes. If you can identify them BEFORE they make their first attack then that would qualify as "predictive".

          Stock analysts make daily predictions based on past behavior. This is not only predictive, but if it wasn't for this past analysis, the predictions would be largely meaningless and highly inaccurate. Or do you want a computer program that can predict what you'll think before you actually think it?

          Not in my experience. The attacks are usually automated scripts running on zombies that randomly scan address (or search their immediate networks) looking for known vulnerabilities.

          How many high profile hosts have you overseen? In my experience, the random attacks you mention are found everywhere. But high-profile hosts are their own deal. I've seen very carefully crafted spam attacks directed at one of my client ISPs that would last anywhere from 3-8 hours. (one of the largest regional ISPs in my area) A typical spam attack would entail perhaps 250,000 deliverable messages. It was a constant game of cat and mouse with firewall rules and automated responses.

          I'd implement an anti-spam technology which would work for anywhere from a few days to a few months, while logging the repeated attempts to crack my solution. And then, the measure would be defeated and I'd be back to the drawing board while the mail cluster's load average spiked to 20.0 or so and users complained.

          One of my more successful ideas I called "Double Dribble". I'd identify spam that had been sent to a non-deliverable address, then returned to sender, then bounced with an invalid return address. I'd calculate the success rate of the source IP address and within 5 minutes or so, I'd have a spam source identified and blocked with a dynamic DNS RBL.

          That solution held off the spammer for almost a full year, until he/she/it began randomizing sending addresses so well that each IP address would send only maybe 10 emails every 24 hours, well below the threshold of Double Dribble. The address pool was insane - well over 100,000 unique IP addresses logged over a 24 hour period.

          Then greylisting was implemented, which stopped the spam dead in its tracks, and completely nullified the spam that Double Dribble couldn't stop. That's when I turned over the account to another party. I still use greylisting personally with great success.

          Now a real predictive system would look more factors.

          #1. Who was attacking.

          #2. How did the attacker(s) gain access to the machines used in the attack.

          #3. What other machines are vulnerable to #2 that are available to #1.

          No. A Real system would find out:

          1) Who was attacking.

          2) Send out the Russian Mafia after them to bust a few kneecaps.

          3) What other machines are attacking that haven't been attacked by the Russian Mafia.

          4) Send Chuck Norris after any attackers who are part of the Russian Mafia.

          5) Scan for Natalie Portman donkey porn and send a copy to you.

          6) ???

          7) Profit!

          • Re: (Score:2, Interesting)

            by nabsltd ( 1313397 )

            Then greylisting was implemented, which stopped the spam dead in its tracks, and completely nullified the spam that Double Dribble couldn't stop. That's when I turned over the account to another party. I still use greylisting personally with great success.

            For me, between greylisting and requiring strict RFC compliance for the "HELO" parameter, pretty much no spam gets through to even be looked at by SpamAssassin.

            For the "HELO" parameter, almost every spambot uses one of:

            • something that isn't a fully qualified domain name ("laptop", "Notebook", and "PC-200806211153" are some recent examples)
            • an IP address

            Neither of these are acceptable (according to section 2.3.5 of the SMTP RFC [ietf.org]) as the "HELO" parameter.

            Then, I throw out a few more bogus things, like:

            • my hos
          • Stock analysts make daily predictions based on past behavior. This is not only predictive, but if it wasn't for this past analysis, the predictions would be largely meaningless and highly inaccurate.

            But I thought those analysts' predictions were largely meaningless and highly inaccurate? It's my understanding that the index funds (which I believe are managed by computers and not people) do much over the long term than any analyst.

        • by kv9 ( 697238 )

          Ummmm, yes. If you can identify them BEFORE they make their first attack then that would qualify as "predictive".

          if I can identify them BEFORE they make their first attack ON MY SERVER then that would qualify as "predictive". I predicted their attack. drop all the fucking semantic hairsplitting please.

      • Re: (Score:3, Insightful)

        by kamochan ( 883582 )

        So if this isn't predictive, what is? Would you rather they develop an algorithm that identifies blacklist-worthy addresses before they make their first attack?

        I invented just such a thing. I blocked the entire comcast network and a couple of big Chinese ISPs in my DSL firewall. Reduced ssh login attempts and spam significantly.

        Predictive - very.
        Collateral victims - nobody I'd care about.

      • by rtb61 ( 674572 )
        You want effective predictive. How about ISP supplied, white lists of good addresses with IPv6 coming on stream attmetping to blacklist billions of addresses will be a pain. Of course when a ISP supplies contaminated white lists range block the ISP until they behave.
    • by LostCluster ( 625375 ) * on Wednesday July 23, 2008 @11:34PM (#24314483)

      That worked back in the say when you could say "Syracuse Unversity's gotten hit with the latest worm. So, don't trust any mail that comes from 128.230.x.x." but these days mail comes from one address per organization or household. Most corperations expose only one mail server IP address to the world, and some smaller companies have hundred-user systems and only one IP to show for it. So, who you're next to doesn't hold much water in predicting whether the message is spam.

      • So, who you're next to doesn't hold much water in predicting whether the message is spam.

        Yes, it does. Look at the spam zombies on the major ISP networks.

        Most corperations expose only one mail server IP address to the world, and some smaller companies have hundred-user systems and only one IP to show for it.

        Now do the math about whether there are more home users on the big ISP networks or whether there are more companies running their own email servers.

        If you're getting spam, 99.9%+ of the time it will be f

    • to have someone poison the 'predictive' list, and suddenly behind such a system would lose access to google, the pirate bay, or demonoid? (c'mon, those are like, the only 3 sites other than slashdot I use!)
    • Re: (Score:2, Interesting)

      What the heck does "highly" predictive mean?

      "Honey, the weatherman is on and he is highly predicting some storms in the evening."

      Maybe "highly effective" prediction?
  • Hmm... (Score:4, Insightful)

    by FlyingSquidStudios ( 1031284 ) on Wednesday July 23, 2008 @10:50PM (#24314215)
    This sounds ripe for abuse. For example, a heavy censorship nation like China could use this to block critical sites that they claim are 'attacking' them far more efficiently than their current human-based censoring.
    • Re: (Score:2, Interesting)

      by elnico ( 1290430 )

      Somehow, I doubt identifying "troubling" sites is the limiting factor in Chinese internet censorship. More likely, the things holding back the censors are international pressure/attention, circumvention by their people, and the censors' own sense of decency, if that exists.

    • Re: (Score:2, Insightful)

      by tukang ( 1209392 )

      This sounds ripe for abuse. For example, a heavy censorship nation like China could use this to block critical sites that they claim are 'attacking' them far more efficiently than their current human-based censoring.

      How is it more efficient for China to tell this software that a particular site is 'attacking' them than to block the site at their great firewall and be done with it?

      • by Joebert ( 946227 )
        They could poison the destination lists so they think China is attacking them, thus blocking China. Remember, Chinas' firewall works ass-backwards.
    • Just as abusive as "We've noticed a spammer keeps registering with ServerFarm.net, let's block their entire network space!" but human blacklists do that already today. Sounds like this is just automating the process.
  • by Jane Q. Public ( 1010737 ) on Wednesday July 23, 2008 @11:12PM (#24314329)
    The problem with ANY "predictive" statistics (like racial profiling, for one glaring example) is that even when they become accurate enough to produce useful information, they tend to produce too many false positives.

    And often (again using racial profiling as a good example), even a few false positives are too many.
    • by elnico ( 1290430 )

      And that's the point! This algorithm is clearly designed to avoid false positives. And it's obviously much more complicated than your standard racial profiling algorithms ("If he's black, he's got crack in the back!"). Just look at the page rank algorithm. It can "predict" that a page will be relevant with minimal input from a human, and it's often very accurate.

    • Re: (Score:3, Insightful)

      The problem with ANY "predictive" statistics (like racial profiling, for one glaring example) is that even when they become accurate enough to produce useful information, they tend to produce too many false positives.

      That is an overly general statement. If that were the case, we wouldn't have any reliable spam filters. There are many statistical methodologies (including ensembles of methodologies) applied carefully to different types of domains that produce excellent and usable false positive rates. Indi

      • I should not have written "any", much less emphasized it. Nevertheless, there is a strong tendency, and it behooves designers to take this into account. Probably I have become cynical, because so many otherwise intelligent people do not quite grasp the subtleties of predictive statistics and refuse to acknowledge this problem, even though it can be demonstrated with nothing more than simple middle-school-level math.
  • by LostCluster ( 625375 ) * on Wednesday July 23, 2008 @11:22PM (#24314385)

    This isn't going to work in the real world. Too many users you want to hear from at an ISP won't like it when the virus-victim spammers gets their whole network preventatively banned.

    Stop fixing the mail protocols we have today. It's time to replace with some form of sender authentication.

    • by Jurily ( 900488 )

      Stop fixing the mail protocols we have today. It's time to replace with some form of sender authentication.

      That still doesn't fix the zombie issue.

      • That still doesn't fix the zombie issue.

        And neither does our current system. There is nothing stopping us from using multiple security solutions and heuristics to stop spam.

      • It does somewhat. Zombies spoof the From field. If that's not possible, then we know exactly who to shut down without any risk of a false positive.
    • by Eighty7 ( 1130057 ) on Thursday July 24, 2008 @12:23AM (#24314727)
      Your post advocates a

      (x) technical ( ) legislative ( ) market-based ( ) vigilante

      approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

      ( ) Spammers can easily use it to harvest email addresses
      ( ) Mailing lists and other legitimate email uses would be affected
      ( ) No one will be able to find the guy or collect the money
      ( ) It is defenseless against brute force attacks
      (x) It will stop spam for two weeks and then we'll be stuck with it
      (x) Users of email will not put up with it
      ( ) Microsoft will not put up with it
      ( ) The police will not put up with it
      (x) Requires too much cooperation from spammers
      (x) Requires immediate total cooperation from everybody at once
      ( ) Many email users cannot afford to lose business or alienate potential employers
      ( ) Spammers don't care about invalid addresses in their lists
      ( ) Anyone could anonymously destroy anyone else's career or business

      Specifically, your plan fails to account for

      ( ) Laws expressly prohibiting it
      (x) Lack of centrally controlling authority for email
      ( ) Open relays in foreign countries
      ( ) Ease of searching tiny alphanumeric address space of all email addresses
      ( ) Asshats
      ( ) Jurisdictional problems
      ( ) Unpopularity of weird new taxes
      ( ) Public reluctance to accept weird new forms of money
      (x) Huge existing software investment in SMTP
      (x) Susceptibility of protocols other than SMTP to attack
      (x) Willingness of users to install OS patches received by email
      (x) Armies of worm riddled broadband-connected Windows boxes
      ( ) Eternal arms race involved in all filtering approaches
      ( ) Extreme profitability of spam
      (x) Joe jobs and/or identity theft
      ( ) Technically illiterate politicians
      ( ) Extreme stupidity on the part of people who do business with spammers
      ( ) Dishonesty on the part of spammers themselves
      ( ) Bandwidth costs that are unaffected by client filtering
      ( ) Outlook

      and the following philosophical objections may also apply:

      (x) Ideas similar to yours are easy to come up with, yet none have ever been shown practical
      ( ) Any scheme based on opt-out is unacceptable
      ( ) SMTP headers should not be the subject of legislation
      ( ) Blacklists suck
      ( ) Whitelists suck
      ( ) We should be able to talk about Viagra without being censored
      ( ) Countermeasures should not involve wire fraud or credit card fraud
      ( ) Countermeasures should not involve sabotage of public networks
      (x) Countermeasures must work if phased in gradually
      ( ) Sending email should be free
      ( ) Why should we have to trust you and your servers?
      ( ) Incompatiblity with open source or open source licenses
      ( ) Feel-good measures do nothing to solve the problem
      ( ) Temporary/one-time email addresses are cumbersome
      ( ) I don't want the government reading my email
      ( ) Killing them that way is not slow and painful enough

      Furthermore, this is what I think about you:

      (x) Sorry dude, but I don't think it would work.
      ( ) This is a stupid idea, and you're a stupid person for suggesting it.
      ( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!
      • by Heembo ( 916647 )

        You are the wind beneath my wings. That was my most favorite Slashdot post, ever.

      • I think you missed a couple, since this is specifically about a client side blacklist.

        Your post advocates a

        (x) technical ( ) legislative ( ) market-based ( ) vigilante

        approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

        ( ) Spammers can easily use it to harvest email addresses
        (x) Mailing lists and other legitimate

    • Rough sketch of what I have been working on:

      (I hope this is going to be formatted correctly. It looks ok in w3m...)

      Confidentiality

      Only the intended recipient is able to read the message. No government
      spying.

      Use public key cryptography. Encrypt the message with the public key of
      the recipient. Only the holder of the corresponding private key can
      decrypt the message.

      Integrity

      The message you send is the message they receive. No monkeying in the
      middle.

      Use message authentication codes. Encrypt a digest of the mess

    • by initialE ( 758110 ) on Thursday July 24, 2008 @05:34AM (#24316111)
      Half of us here are for sender authentication, or at least verification. And half of us are for privacy and anonymity. These, to me, are conflicting goals. The sad thing is that there is overlap, that people want their privacy, not realizing that spam is exactly what that privacy brings. It surprises me that people can laugh at the implementations of DRM (But Bob and Eve are the same person! Hilarity ensues...) and not know that this is a very similar issue right here, (Bob wants his rights protected, but he doesn't want any riff raff Eve out there to contact him. But Bob and Eve are the same person! Not so funny now?) and it, like DRM, could very well be unsolvable.
      • Re: (Score:3, Insightful)

        I don't think it's privacy that needs to be sacrificed, but ease of access. All the popular instant messaging systems, forum and blog software etc. are subject to spam. If it was harder to obtain an address on these services, it would be much harder for spammers to abuse them.

        On the other hand, ease of access is one of their primary benefits. An additional hurdle for SMTP is the lack of centralised controls, which is an important thing for any de facto standard communication tool to have.

        • If it was harder to obtain an address on these services, it would be much harder for spammers to abuse them.

          That's right, if we found a more complex form of DRM, surely the pirates won't be able to crack it!
          The truth is, every time you raise the bar of entry, someone who is determined to cross that bar will be able to do so. A more complex captcha? A more secure forum? All we are doing is raising the ante in a game of one-upmanship.

          • The difference between spam and DRM is that spam is received from people who you don't want any kind of contact with, and don't even want or need to have them on the network. DRM tries to prevent people from accessing an unencrypted bitstream in order to copy it or convert it to another format, while also requiring that they're able to access the unencrypted bitstream in order to view / run it.

            DRM is therefore an unsolvable problem, and the best you can do is raise the bar enough that it becomes too difficu

  • Enumerating Badness (Score:5, Interesting)

    by giminy ( 94188 ) on Thursday July 24, 2008 @12:03AM (#24314645) Homepage Journal

    Every time I read some new whiz-bang security tool, I look back to Marcus Ranum's terrific The Six Dumbest Ideas in Computer Security [ranum.com] article.

    This idea meets three of the 'dumb' criteria:

    1) Default Permit. Use of firewalls (even 'intelligent' firewalls) allows all traffic through, except that traffic that looks somehow bad.
    2) Enumerating Badness. Kind of like #1, you're blacklisting the bad stuff. There's a helpful chart in the article to show why this is dumb.
    6) Action is Better than Inaction. 'Nuff said.

    Reid

    • by RAMMS+EIN ( 578166 ) on Thursday July 24, 2008 @01:50AM (#24315221) Homepage Journal

      Well, two points here.

      First of all, security and spam are not the same. If one security threat makes it through to you, your security has been compromised. If one spam message makes it through to you, it's a little annoying, but no disaster. If, on the other hand, your "spam filtering" causes a legitimate message not to reach you, this is much worse. For spam, you err on the safe side by letting the message through. In security, you err on the safe side by blocking it.

      Secondly, while mjr's 6 "dumb ideas" aren't going to give you perfect security, it's not obvious how you _would_ get that, nor that you should not implement any of those ideas. For example: enumerating badness is certainly not going to allow you to recognize and stop all badness. However, it isn't clear how you _would_ do that. How do you determine if something should or shouldn't be allowed to enter your system? Perhaps having a list of things you _don't_ want on your system could be helpful.

      Enumerating badness certainly seems to work pretty well for email. With software, you can (really!) get away with making a list of what _is_ allowed on your system, and refuse everything else. With email, you actually _want_ messages you have never seen before from people you have never seen before, about things you have never talked about before. At least, most people do. On the other hand, spammers will often send lots of somehow similar messages. My spam filter, which I train based on lists of good and bad messages, correctly recognizes all good messages and something like 99% (it varies a bit) of bad messages. It doesn't keep the spam out, but it reduces it by a factor 100, without losing me any good messages. Is this a Dumb Idea?

      • "Well, two points here .. First of all, security and spam are not the same"

        Identifying spam is actually 'enumerating badness', which does lead to losing legitimate messages.
    • by mjensen ( 118105 )

      From that link:
      There's an old saying, "You cannot make a silk purse out of a sow's ear." It's pretty much true, unless you wind up using so much silk to patch the sow's ear that eventually the sow's ear is completely replaced with silk.

      Read that.....He's talking about making a sow's ear from a silk purse and has his idea backwards.
      There's a lot of (sometimes technical) statements on that site that show the creator made it without thinking very much about it.

  • by skaet ( 841938 ) on Thursday July 24, 2008 @12:33AM (#24314793) Homepage

    ... then they could warn the poor bastard she's going to attack next.

  • "the new HPB service will employ a link analysis algorithm to cross-compare firewall logs"

    Snoooze ..
  • by Colin Smith ( 2679 ) on Thursday July 24, 2008 @04:13AM (#24315763)

    That's what we really need... (baggsy on the acronym BTW)

    A network of mathematical values which define reputation relative to one another. We have a number of attempts at this in place just now, not the least of which are Slashdot Karma, Google Pagerank, Stumbleupon etc. The thing is that what may be a good reputation to one person may well be the antithesis to another, so simple averaging is inappropriate. Richard Dawkins for example is someone who will have a very high reputation among certain groups and very low among others.

    I should be able to see a relative reputation of someone/thing based on those other things which I hold in esteem and the things/people which they hold in esteem.

    Decidedly non trivial. We haven't actually worked it out in The Real World (tm) either, relying on branding instead.

     

  • Wait until somebody spoofs somebody else's IP address and throws "attacks" with it at a few of the networks that submit logs. That would effectively block the IP from the spoofed address as the system would predict that the host is an attacker. Since TCP allows us to spoof almost any IP we want, we could get creative and spoof the addresses of the submitting members or even dshield itself.

It is easier to write an incorrect program than understand a correct one.

Working...