Researchers Create Highly Predictive Blacklists 71
Grablets writes "Using a link analysis algorithm similar to Google PageRank, researchers at the SANS Institute and SRI International have created a new Internet network defense service that rethinks the way network blacklists are formulated and distributed. The service, called Highly Predictive Blacklisting, exploits the relationships between networks that have been attacked by similar Internet sources as a means for predicting which attack sources are likely to attack which networks in the future. A free experimental version is currently available."
Re: (Score:2)
Suck my double-precision floats, AC!
Not really that "predictive". (Score:5, Informative)
They take X firewall logs ...
Then they look for matches in attacking IP addresses between the logs ...
And if any IP addresses appear in log A (which is very similar to log B) ... then those IP addresses are "predicted" as being possible to attack the firewall from which log B was obtained.
Logical - yes.
Predictive - no.
Re:Not really that "predictive". (Score:4, Insightful)
I agree, but the key here is to ensure that there are no false positives, which have been traditionally the biggest problem with blacklists.
If they figure that out, I don't care what kind of statistical approach they are using, as long as it works.
I think someone from MIT (maybe three or four years ago during the height of the problems with Spamhaus?) tried this before, but I don't remember if it got anywhere. Maybe this is an ofshoot from that.
In the meantime... SpamAssasin with whitelists, which is the best of worse worlds.
Re: (Score:3, Insightful)
It's pretty easy to get false positives depending on how you configure SpamAssasin.
Re: (Score:2)
First thing is to upgrade the version of SA you're using, then configure it better (install good rule-sets), train bayes, in that order.
I have accounts on servers who have different policies/versions, and have experienced no (important!) false positives on one and had to whitelist on the other.
Convincing the people sending SPAM-ish looking mail to do otherwise could also help, rather than just accepting it:
http://wiki.apache.org/spamassassin/AvoidingFpsForSenders [apache.org]
Re:Not really that "predictive". (Score:5, Informative)
Logical - yes.
Predictive - no.
So if this isn't predictive, what is? Would you rather they develop an algorithm that identifies blacklist-worthy addresses before they make their first attack?
The application of this algorithm actually seems pretty clever. It captures the fact that "true" attackers mostly attack "true" (that is, weak or high profile) targets, whereas those targets are mostly attacked by "true" attackers. Thus some isolated attack by a never-before-detected attacker on a never-before-attacked target has very little predictive potential in the eyes of the algorithm, whereas even just a few attacks by a never-before-seen attacker on several oft-attacked targets raises a huge red flag.
Re: (Score:2)
I don't know, but it isn't the service called Highly Predictive Blacklisting that the article is about.
Re: (Score:2)
What I want to know is whether this HPB protocol itself is vulnerable to attack. Can I spoof a few packets at a few common targets and get some sucker blacklisted by half the Internet? Can I run 30,000 virtualized HPB nodes and use them to stack the deck with maliciously generated logs?
Re: (Score:1, Redundant)
Failure.
I like to play it safe and assume nothing works.
Re: (Score:2)
Not really. (Score:5, Interesting)
Ummmm, yes. If you can identify them BEFORE they make their first attack then that would qualify as "predictive".
Not in my experience. The attacks are usually automated scripts running on zombies that randomly scan address (or search their immediate networks) looking for known vulnerabilities.
That is the opposite of how their system was described. They looked for matches amongst IP addresses and then "predicted" that if your example machine one firewall it should be blacklisted for the other firewalls that closely matched that list.
Now a real predictive system would look more factors.
#1. Who was attacking.
#2. How did the attacker(s) gain access to the machines used in the attack.
#3. What other machines are vulnerable to #2 that are available to #1.
Example - Spam zombies often appear in ranges of home addresses from the large ISP's. So machines in those ranges are given an increased score in SpamAssassin. Whether they have ever sent spam before or not. See #1 and #2 and #3.
Re:Not really. (Score:5, Interesting)
Ummmm, yes. If you can identify them BEFORE they make their first attack then that would qualify as "predictive".
Stock analysts make daily predictions based on past behavior. This is not only predictive, but if it wasn't for this past analysis, the predictions would be largely meaningless and highly inaccurate. Or do you want a computer program that can predict what you'll think before you actually think it?
Not in my experience. The attacks are usually automated scripts running on zombies that randomly scan address (or search their immediate networks) looking for known vulnerabilities.
How many high profile hosts have you overseen? In my experience, the random attacks you mention are found everywhere. But high-profile hosts are their own deal. I've seen very carefully crafted spam attacks directed at one of my client ISPs that would last anywhere from 3-8 hours. (one of the largest regional ISPs in my area) A typical spam attack would entail perhaps 250,000 deliverable messages. It was a constant game of cat and mouse with firewall rules and automated responses.
I'd implement an anti-spam technology which would work for anywhere from a few days to a few months, while logging the repeated attempts to crack my solution. And then, the measure would be defeated and I'd be back to the drawing board while the mail cluster's load average spiked to 20.0 or so and users complained.
One of my more successful ideas I called "Double Dribble". I'd identify spam that had been sent to a non-deliverable address, then returned to sender, then bounced with an invalid return address. I'd calculate the success rate of the source IP address and within 5 minutes or so, I'd have a spam source identified and blocked with a dynamic DNS RBL.
That solution held off the spammer for almost a full year, until he/she/it began randomizing sending addresses so well that each IP address would send only maybe 10 emails every 24 hours, well below the threshold of Double Dribble. The address pool was insane - well over 100,000 unique IP addresses logged over a 24 hour period.
Then greylisting was implemented, which stopped the spam dead in its tracks, and completely nullified the spam that Double Dribble couldn't stop. That's when I turned over the account to another party. I still use greylisting personally with great success.
Now a real predictive system would look more factors.
#1. Who was attacking.
#2. How did the attacker(s) gain access to the machines used in the attack.
#3. What other machines are vulnerable to #2 that are available to #1.
No. A Real system would find out:
1) Who was attacking.
2) Send out the Russian Mafia after them to bust a few kneecaps.
3) What other machines are attacking that haven't been attacked by the Russian Mafia.
4) Send Chuck Norris after any attackers who are part of the Russian Mafia.
5) Scan for Natalie Portman donkey porn and send a copy to you.
6) ???
7) Profit!
Re: (Score:2, Interesting)
Then greylisting was implemented, which stopped the spam dead in its tracks, and completely nullified the spam that Double Dribble couldn't stop. That's when I turned over the account to another party. I still use greylisting personally with great success.
For me, between greylisting and requiring strict RFC compliance for the "HELO" parameter, pretty much no spam gets through to even be looked at by SpamAssassin.
For the "HELO" parameter, almost every spambot uses one of:
Neither of these are acceptable (according to section 2.3.5 of the SMTP RFC [ietf.org]) as the "HELO" parameter.
Then, I throw out a few more bogus things, like:
Re: (Score:2)
But I thought those analysts' predictions were largely meaningless and highly inaccurate? It's my understanding that the index funds (which I believe are managed by computers and not people) do much over the long term than any analyst.
Re: (Score:2)
Ummmm, yes. If you can identify them BEFORE they make their first attack then that would qualify as "predictive".
if I can identify them BEFORE they make their first attack ON MY SERVER then that would qualify as "predictive". I predicted their attack. drop all the fucking semantic hairsplitting please.
Re: (Score:3, Insightful)
So if this isn't predictive, what is? Would you rather they develop an algorithm that identifies blacklist-worthy addresses before they make their first attack?
I invented just such a thing. I blocked the entire comcast network and a couple of big Chinese ISPs in my DSL firewall. Reduced ssh login attempts and spam significantly.
Predictive - very.
Collateral victims - nobody I'd care about.
Re: (Score:2)
Re:Not really that "predictive". (Score:5, Insightful)
That worked back in the say when you could say "Syracuse Unversity's gotten hit with the latest worm. So, don't trust any mail that comes from 128.230.x.x." but these days mail comes from one address per organization or household. Most corperations expose only one mail server IP address to the world, and some smaller companies have hundred-user systems and only one IP to show for it. So, who you're next to doesn't hold much water in predicting whether the message is spam.
Yes, it does. (Score:2)
Yes, it does. Look at the spam zombies on the major ISP networks.
Now do the math about whether there are more home users on the big ISP networks or whether there are more companies running their own email servers.
If you're getting spam, 99.9%+ of the time it will be f
Wouldn't it be funny... (Score:2)
Re: (Score:2, Interesting)
"Honey, the weatherman is on and he is highly predicting some storms in the evening."
Maybe "highly effective" prediction?
Hmm... (Score:4, Insightful)
Re: (Score:2, Interesting)
Somehow, I doubt identifying "troubling" sites is the limiting factor in Chinese internet censorship. More likely, the things holding back the censors are international pressure/attention, circumvention by their people, and the censors' own sense of decency, if that exists.
Re: (Score:2, Insightful)
This sounds ripe for abuse. For example, a heavy censorship nation like China could use this to block critical sites that they claim are 'attacking' them far more efficiently than their current human-based censoring.
How is it more efficient for China to tell this software that a particular site is 'attacking' them than to block the site at their great firewall and be done with it?
Re: (Score:2)
Re: (Score:2)
Probably a bad idea. (Score:5, Insightful)
And often (again using racial profiling as a good example), even a few false positives are too many.
Re: (Score:2)
insurance companies to pay equal retirement benefits to women as they do to men
I don't get what this has to do with the grandparent post. There are no such thing as false positives in insurance payouts.
Re: (Score:3)
>There are no such thing as false positives in insurance payouts.
Fake your death - bag the life-insurance ?
And there could be false negatives too:
"We regret to inform you that we will not be paying out your husband's life insurance as people of 25 have a high likelihood of still being alive"
Joking aside: what about a man whose insurance payout is not made because the insurance company (incorrectly) believes he committed suicide ? Is that not a false negative ?
Re: (Score:2)
Is that not a false negative ?
True. The suicide example is a good one. I agree that my statement was too broad. Still, the original poster's statement that I was replying to was giving any even worse argument. Only a slight excuse though.
Thanks for coming up with a good example of why I was wrong. :)
Haha (Score:2)
Re: (Score:1)
And that's the point! This algorithm is clearly designed to avoid false positives. And it's obviously much more complicated than your standard racial profiling algorithms ("If he's black, he's got crack in the back!"). Just look at the page rank algorithm. It can "predict" that a page will be relevant with minimal input from a human, and it's often very accurate.
Re: (Score:3, Insightful)
That is an overly general statement. If that were the case, we wouldn't have any reliable spam filters. There are many statistical methodologies (including ensembles of methodologies) applied carefully to different types of domains that produce excellent and usable false positive rates. Indi
I stand corrected (Score:2)
Re: (Score:2)
Babies out with the bath water. (Score:5, Insightful)
This isn't going to work in the real world. Too many users you want to hear from at an ISP won't like it when the virus-victim spammers gets their whole network preventatively banned.
Stop fixing the mail protocols we have today. It's time to replace with some form of sender authentication.
Re: (Score:2)
Stop fixing the mail protocols we have today. It's time to replace with some form of sender authentication.
That still doesn't fix the zombie issue.
Re: (Score:2)
That still doesn't fix the zombie issue.
And neither does our current system. There is nothing stopping us from using multiple security solutions and heuristics to stop spam.
Re: (Score:2)
Standard form (Score:5, Funny)
(x) technical ( ) legislative ( ) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
(x) It will stop spam for two weeks and then we'll be stuck with it
(x) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
(x) Requires too much cooperation from spammers
(x) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
(x) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
(x) Huge existing software investment in SMTP
(x) Susceptibility of protocols other than SMTP to attack
(x) Willingness of users to install OS patches received by email
(x) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
(x) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
(x) Ideas similar to yours are easy to come up with, yet none have ever been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
(x) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(x) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!
Re: (Score:3)
You are the wind beneath my wings. That was my most favorite Slashdot post, ever.
Re: (Score:1)
I think you missed a couple, since this is specifically about a client side blacklist.
Your post advocates a
(x) technical ( ) legislative ( ) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
(x) Mailing lists and other legitimate
Re: (Score:3)
Rough sketch of what I have been working on:
(I hope this is going to be formatted correctly. It looks ok in w3m...)
Confidentiality
Only the intended recipient is able to read the message. No government
spying.
Use public key cryptography. Encrypt the message with the public key of
the recipient. Only the holder of the corresponding private key can
decrypt the message.
Integrity
The message you send is the message they receive. No monkeying in the
middle.
Use message authentication codes. Encrypt a digest of the mess
Re:Babies out with the bath water. (Score:4, Interesting)
Re: (Score:3, Insightful)
I don't think it's privacy that needs to be sacrificed, but ease of access. All the popular instant messaging systems, forum and blog software etc. are subject to spam. If it was harder to obtain an address on these services, it would be much harder for spammers to abuse them.
On the other hand, ease of access is one of their primary benefits. An additional hurdle for SMTP is the lack of centralised controls, which is an important thing for any de facto standard communication tool to have.
Re: (Score:2)
If it was harder to obtain an address on these services, it would be much harder for spammers to abuse them.
That's right, if we found a more complex form of DRM, surely the pirates won't be able to crack it!
The truth is, every time you raise the bar of entry, someone who is determined to cross that bar will be able to do so. A more complex captcha? A more secure forum? All we are doing is raising the ante in a game of one-upmanship.
Re: (Score:3)
The difference between spam and DRM is that spam is received from people who you don't want any kind of contact with, and don't even want or need to have them on the network. DRM tries to prevent people from accessing an unencrypted bitstream in order to copy it or convert it to another format, while also requiring that they're able to access the unencrypted bitstream in order to view / run it.
DRM is therefore an unsolvable problem, and the best you can do is raise the bar enough that it becomes too difficu
Enumerating Badness (Score:5, Interesting)
Every time I read some new whiz-bang security tool, I look back to Marcus Ranum's terrific The Six Dumbest Ideas in Computer Security [ranum.com] article.
This idea meets three of the 'dumb' criteria:
1) Default Permit. Use of firewalls (even 'intelligent' firewalls) allows all traffic through, except that traffic that looks somehow bad.
2) Enumerating Badness. Kind of like #1, you're blacklisting the bad stuff. There's a helpful chart in the article to show why this is dumb.
6) Action is Better than Inaction. 'Nuff said.
Reid
Re:Enumerating Badness (Score:5, Interesting)
Well, two points here.
First of all, security and spam are not the same. If one security threat makes it through to you, your security has been compromised. If one spam message makes it through to you, it's a little annoying, but no disaster. If, on the other hand, your "spam filtering" causes a legitimate message not to reach you, this is much worse. For spam, you err on the safe side by letting the message through. In security, you err on the safe side by blocking it.
Secondly, while mjr's 6 "dumb ideas" aren't going to give you perfect security, it's not obvious how you _would_ get that, nor that you should not implement any of those ideas. For example: enumerating badness is certainly not going to allow you to recognize and stop all badness. However, it isn't clear how you _would_ do that. How do you determine if something should or shouldn't be allowed to enter your system? Perhaps having a list of things you _don't_ want on your system could be helpful.
Enumerating badness certainly seems to work pretty well for email. With software, you can (really!) get away with making a list of what _is_ allowed on your system, and refuse everything else. With email, you actually _want_ messages you have never seen before from people you have never seen before, about things you have never talked about before. At least, most people do. On the other hand, spammers will often send lots of somehow similar messages. My spam filter, which I train based on lists of good and bad messages, correctly recognizes all good messages and something like 99% (it varies a bit) of bad messages. It doesn't keep the spam out, but it reduces it by a factor 100, without losing me any good messages. Is this a Dumb Idea?
re: enumerating logic .. (Score:2)
Identifying spam is actually 'enumerating badness', which does lead to losing legitimate messages.
Re: (Score:1)
From that link:
There's an old saying, "You cannot make a silk purse out of a sow's ear." It's pretty much true, unless you wind up using so much silk to patch the sow's ear that eventually the sow's ear is completely replaced with silk.
Read that.....He's talking about making a sow's ear from a silk purse and has his idea backwards.
There's a lot of (sometimes technical) statements on that site that show the creator made it without thinking very much about it.
Wish they'd add my ex-gf to the list... (Score:4, Funny)
... then they could warn the poor bastard she's going to attack next.
comparing firewall logs .. (Score:2)
Snoooze
Distributed Universal Reputation System (Score:4)
That's what we really need... (baggsy on the acronym BTW)
A network of mathematical values which define reputation relative to one another. We have a number of attempts at this in place just now, not the least of which are Slashdot Karma, Google Pagerank, Stumbleupon etc. The thing is that what may be a good reputation to one person may well be the antithesis to another, so simple averaging is inappropriate. Richard Dawkins for example is someone who will have a very high reputation among certain groups and very low among others.
I should be able to see a relative reputation of someone/thing based on those other things which I hold in esteem and the things/people which they hold in esteem.
Decidedly non trivial. We haven't actually worked it out in The Real World (tm) either, relying on branding instead.
Spoof City, here we come! (Score:1)