Forgot your password?
typodupeerror
AI Security Spam Technology News

Poison Attacks Against Machine Learning 82

Posted by timothy
from the think-zippy-the-pinhead dept.
mikejuk writes "Support Vector Machines (SVMs) are fairly simple but powerful machine learning systems. They learn from data and are usually trained before being deployed. SVMs are used in security to detect abnormal behavior such as fraud, credit card use anomalies and even to weed out spam. In many cases they need to continue to learn as they do the job and this raised the possibility of feeding it with data that causes it to make bad decisions. Three researchers have recently demonstrated how to do this with the minimum poisoned data to maximum effect. What they discovered is that their method was capable of having a surprisingly large impact on the performance of the SVMs tested. They also point out that it could be possible to direct the induced errors so as to produce particular types of error. For example, a spammer could send some poisoned data so as to evade detection for a while. AI based systems may be no more secure than dumb ones."
This discussion has been archived. No new comments can be posted.

Poison Attacks Against Machine Learning

Comments Filter:
  • by Anonymous Coward on Sunday July 22, 2012 @09:42AM (#40729419)

    Why the hell is the only link in the summary to that rather useless "I Programmer" website? The summary here at Slashdot is basically the content of the entire linked "article"!

    Here is a much more useful link for anyone interested in reading the actual paper: http://arxiv.org/abs/1206.6389v1 [arxiv.org]

    • by mbone (558574)

      The "original article" has a link to the arxiv preprint at the bottom.

      • Yes but the sentiment stands. why base the story on a story about an article and not on the article itself?
        Bone idleness? Idiocy? Hunger? Gas?

  • Try this on humans (Score:5, Interesting)

    by s_p_oneil (795792) on Sunday July 22, 2012 @09:46AM (#40729433) Homepage

    Universities should run a number of psychology experiments to see how this can be done to human intelligence to see how susceptible it is compared to AI. Or you could just study people who tune in to .

    • by s_p_oneil (795792)

      Sorry, Slashdot stripped out my "insert questionable media outlet here" message. I previewed it a bit too quickly.

    • by Yaa 101 (664725)

      It is already known that human brains make up what they miss in presented info.

      With people you only have to withhold info to get them to make bad decisions

    • Re: (Score:2, Insightful)

      by sg_oneill (159032)

      When you think about it, whats going on here is inducing mental illness in "thinking" machines.

      We already know how to induce mental illness in humans. Religion and war.

    • by Lisias (447563)

      Universities should run a number of psychology experiments to see how this can be done to human intelligence to see how susceptible it is compared to AI. Or you could just study people who tune in to .

      They're still busy trying to understand the Milgram's results.

    • by ceoyoyo (59147)

      They have. Google Milgram experiment.

  • Propaganda (Score:5, Insightful)

    by mbone (558574) on Sunday July 22, 2012 @09:47AM (#40729441)

    On this side of the human / AI line, we call this propaganda. It has historically proved very effective, specially if you can control all of the "training data."

    • Historically? Just what do you think D.A.R.E. is?
      • by MightyYar (622222)

        D.A.R.E. would be a pretty poor example - it has never been found to be effective.

        • That depends on your definition of "success" -- D.A.R.E. has been overwhelmingly successful at convincing people that some drugs should be illegal. See, for example, the large number of people who are convinced that cocaine, heroine, and methamphetamine are evil and must be banned (and never mind that two of the three drugs are legal by prescription).
          • by peragrin (659227)

            I don't believe drugs are bad, but the use of some drugs should be tightly regulated.

            the average person is really bad at self medication. either going way to far or to little. Drugs with side effects trigger attachments. Caffeine is just as dangerous as Alcohol in that respect. Some people really can't handle their caffeine very well either. Go to a coffee stand (or at work) and watch some people with their hands shaking so hard they can't hold the coffee in the cup.

            That is a sign of a drug addiction be

            • the average person is really bad at self medication.

              And why is it our job to protect them?

              Boxing is extremely dangerous. If two people make the choice to get in the ring, we may think that's unwise, but it's their decision. If you make the decision to do something that will harm you, you may be an idiot, but I don't have the moral right to stop you through means other than making an argument to try to change your mind.

              When you get into things that have the potential of harming others, then that's another story. You're free to drink alcohol and use whateve

              • by adri (173121)

                Because you don't live in a world where individuals' actions have no effect outside of the individual.

                If two people decide to get in the ring and box, and suffer brain damage in the long term, so be it. What effect could it have?

                If a hundred thousand pairs of people decide to get in the ring and box, what kind of long term effects will that have on the people around them? Would there be an increase in accidents? A decrease in critical thinking? What kind of effects would it have on their planning and execut

              • by Rhalin (791665)

                the average person is really bad at self medication.

                And why is it our job to protect them?

                Boxing is extremely dangerous. If two people make the choice to get in the ring, we may think that's unwise, but it's their decision. If you make the decision to do something that will harm you, you may be an idiot, but I don't have the moral right to stop you through means other than making an argument to try to change your mind.

                When you get into things that have the potential of harming others, then that's another story. You're free to drink alcohol and use whatever other drugs you want to. You're not free to drive on public roads under their influence.

                I'm unfamiliar with a theory of social morality that supports the line of reasoning you start from. Could you point me towards more information on this that is supported by contemporary social theory? Preferably grounded in a processural approach?

                Thanks!

            • Re:Propaganda (Score:5, Insightful)

              by betterunixthanunix (980855) on Sunday July 22, 2012 @12:38PM (#40730211)

              Drugs with side effects trigger attachments. Caffeine is just as dangerous as Alcohol in that respect

              Except that "attachments" are not dangerous. Coma and death are dangerous, brain damage is dangerous, liver damage is dangerous, and the typical doses of alcohol are frighteningly close to such adverse effects -- whereas the typical dose of caffeine is nowhere near that point.

              Go to a coffee stand (or at work) and watch some people with their hands shaking so hard they can't hold the coffee in the cup.

              Which may be scary, but is not a sign of any permanent damage to that person's mind or body. Caffeine withdrawal is tough, but it is not life threatening, and a person who is committed to it can get through the symptoms at home (maybe with the help of close friend) in less than a week. Alcohol withdrawal, on the other hand, can be so dangerous that it requires medical supervision.

              That is a sign of a drug addiction beyond the persons ability to control.

              Yet the drug abuse and dependence treatment programs that emerged from clinical psychology (read: science) are based on teaching people how to take control and avoid harmful behaviors.

              Prescribed drugs can be abused but at least someone is trying to limit the effects

              Really? A typical Adderall prescription (d,l-amphetamine salts) is for 10-20mg, two-three times per day, for a month. That is well above a lethal quantity, and a person could easily give themselves brain damage by taking a large fraction of their month's supply. People who abuse Adderall and related medicines (other amphetamines, Ritalin, etc.) can have psychotic episodes; see, for example, this recent NY Times article (sorry for paywall) about prescription stimulant abuse among high school and college students:

              https://www.nytimes.com/2012/06/10/education/seeking-academic-edge-teenagers-abuse-stimulants.html?_r=1&hp [nytimes.com]

              It's not just psychiatric drugs; prescription opiates are also readily abused, and people get high by using the prescribed amount of those drugs. Some pharmaceutical opiates are more potent than heroin, and abuse is an ever-present concern with those drugs; Rush Limbaugh abused prescription opiates:

              http://www.cbsnews.com/2100-201_162-1561324.html [cbsnews.com]

              Here is the problem with the war on drugs: recreational drugs need not be any more dangerous than prescription drugs. Pharmaceutical methamphetamine is safer than "truck stop" methamphetamine, not because it is a different drug, but because the production is much better controlled. Many of the dangerous of recreational methamphetamine stem from the adulterants that are left over from poor production techniques.

              So in a sense, I agree with you: we need better regulation. That means legalizing recreational drugs, and requiring that legal sources adhere to standardized and regulation production and distribution methods (I do not think anyone can argue that a 14 year old should be buying recreational drugs). When someone buys cocaine, they should not have to worry about what is mixed into the drug; when someone buys MDMA (ecstasy), they should not worry about having actually received methamphetamine mixed with caffeine (a well known trick on the black market). There will still be problems with abuse, but when someone visits their doctor, they should be able to tell their doctor what drugs they have been taking, and in what doses -- which is basically impossible if you are buying some mystery powder in an alley somewhere.

              • I deliberately quit coffee every 4 months or so. Then when I start again it is so much more effective. Quitting isn't that hard, given I drink more than 7 cups a day normally..

                • I found that quitting coffee came with headaches and tiredness for a day or two -- not the worst thing in the world (people go through worse with tobacco) but not something to shrug at.
                  • by Smauler (915644)

                    I've recently (the last 6 months or so) been on and off of tobacco, ie. smoke about 20 a day for a week, stop for 3 or 4 days, smoke for a week, stop again, etc. I've been a smoker for almost 20 years. This isn't because I want to quit - I don't, I enjoy smoking. I think the physical dependencies are completely exaggerated...

                    I have a much bigger physical craving for alcohol after not drinking for a while, to the extent I deliberately don't drink a lot of the time. Cocaine's not too bad, but it's insidio

              • by bmacs27 (1314285)
                SVMs are cool!
            • Caffeine is just as dangerous as Alcohol in that respect.

              I can see it now: MACC, Mothers against caffeinated coffee/cola.

          • heroine

            So they object to novels with female protagonists? :-)

      • by mbone (558574)

        Historically? Just what do you think D.A.R.E. is?

        amateurs

        • Re:Propaganda (Score:5, Interesting)

          by betterunixthanunix (980855) on Sunday July 22, 2012 @12:43PM (#40730257)
          I disagree; D.A.R.E. has been overwhelmingly successful at convincing people of the legitimacy of the war on drugs and the paramilitary police that were created in the name of that war. Hardly anyone questions the fact that we have soldiers (but with "POLICE" or "DEA" written on their uniforms) attacking unarmed civilians just to serve an arrest warrant. Hardly anyone questions the fact that the executive branch of government, through the Attorney General's office, now has the power to make and enforce drug laws, without democratic action. Hardly anyone questions the fact that the DEA, supposedly a law enforcement agency, has so much signals intelligence capability that the dictators of some nations have tried to demand the DEA's help in spying on political opponents.

          How many propaganda programs have been so successful at convincing people that this sort of unwinding of a democratic system is the right thing to do?
  • The security implications aside, one problem I see is a possible arms race between the poisoners and the AI designers. The only way for the designers to win is to build tests that are less tolerant of the poisoned data. This is good if AI systems are built to interact only with other AI systems. But what if humans are the end users?

    At some point, the increase in data precision will come up against the natural imprecision of human users. Fewer humans will be smart enough to pass the Turing test. A practical

    • by mbone (558574)

      I have this mental image that in the future not everyone will be able to pass as human (i.e., routinely solve captchas), and the ones who can may be able to rent out that service to those who can't.

      • I have this mental image that in the future not everyone will be able to pass as human (i.e., routinely solve captchas), and the ones who can may be able to rent out that service to those who can't.

        The good thing is that us non-humans can then travel all around the world really cheap. I, personally, belong in healthcare products as a natural Fleshlight-substitute!

      • I have a mental image of a future without captcha, where we rely on things like HashCash instead -- slowing down spammers, rather than defeating them entirely.
    • it's just an elaborate filter program. which is far away from real AI.

  • it's called propaganda

    see: Fox News

    • by Toonol (1057698)
      Your comment is amusing, because by singling out Fox News, you're demonstrating that you're a victim of very successful propaganda.
      • because I point to a source of propaganda can only mean I am a victim of propaganda?

        • He's saying that if you believe that Fox is the only source of propaganda, then you are a victim of the other sources of propaganda. Your citing Fox may not be singling them out, but just an indication that you believe that they are the worst in this regard.

    • by gl4ss (559668)

      no, it's more like a guy down the street yelling that the end of the world is nigh and you believing him despite fox news(the main source) telling otherwise..

  • by Kanel (1105463) on Sunday July 22, 2012 @10:43AM (#40729663) Journal

    There's already a whole subfield of machine learning which concern itself with these problems. It's called "adversarial machine learning".
    The approaches are very different from usual software security. Instead of busying oneself with patching holes in software or setting up firewalls, adversarial machine learning re-design the algorithms completely, using game theory and other techniques. The premise is "How can we make an algorithm that works in an environment full of enemies that try to mislead it?" It's a refreshing change from the usual software-security paradigm, which is all about fencing the code into some supposedly 'safe' environment.

    • "How can we make an algorithm that works in an environment full of enemies that try to mislead it?"

      This sounds like it is closely related to secure multiparty computation, where the goal is to correctly compute some function on multiple parties' inputs without revealing those inputs. This has been researched since the 1980s, and there have been numerous results on feasibility and impossibility, as well as several practical systems (including at least one that was used in the real world). It is likely that both approaches can be used to solve the same set of problems, but that the machine learning app

    • by node636 (2526762)
      agreed. there already exist plenty of simple methods for identifying and removing 'bad' data. Currently they're usually applied to a static data set before sending it to the machine. It should be simple to implement algorithms that perform this computation at run time.
  • Not very practical (Score:4, Insightful)

    by ceoyoyo (59147) on Sunday July 22, 2012 @10:58AM (#40729749)

    So if you know the algorithm and training data, and you can feed the system new data with manipulated labels then you can confuse it. It's a little early to panic about your spam filter. Hopefully everyone realizes that if you let the spammers tell your computer what is and is not spam, they can cause it to let their spam through.

    • by Kjella (173770)

      So if you know the algorithm and training data, and you can feed the system new data with manipulated labels then you can confuse it. It's a little early to panic about your spam filter. Hopefully everyone realizes that if you let the spammers tell your computer what is and is not spam, they can cause it to let their spam through.

      Well I assume that's why the spam/not spam buttons are there in my webmail reader, that somehow this goes into a form of feedback system. I'd not be surprised if spammers send spam to themselves, then flag it as not spam in order to confuse the system. Or signing up for stuff legitimately, then flagging it as spam anyway. Anything to increase the noise floor so they have to back off on filtering or lose genuinely wanted mail.

      • by Sqr(twg) (2126054)

        I doubt they would spend energy on this. Setting up fake mail accounts costs time/money, and even though the spammers as a collective might benefit from attacking the spam filter, it is more profitable for the individual spammer to use those accounts for sending spam.

        Also, a support vector network could easily learn that the "not spam" flag from certain users actually means the opposite.

      • by ceoyoyo (59147)

        I doubt it. Google's spam filter seems to work just as well as my local one, and spammers are definitely not managing my spam/not spam button.

        If that were the case though, it's an excellent reason not to use spam filters that spammers control.

  • I know that email spammers have been exploiting this to make bayesian filters for the past decade
  • SVM != AI (Score:4, Informative)

    by SpinyNorman (33776) on Sunday July 22, 2012 @11:38AM (#40729927)

    Support Vector Machines are just a way of performing unsupervised data partitioning/clustering. i.e. you feed a bunch of data vectors into the algorithm and it determines how to split the data into a number of clusters where the members of each cluster are similar to each other and less similar to members of other clusters.

    e.g. you feed it (number of wheels, weight) pairs of a lot of vehicles and it might automatically split the data into 3 clusters - light 2-wheeled vehicles, heavy 4-wheeled ones, and very heavy 4-wheeled ones. If you then labelled these clusters as "bikes", "cars" and "trucks" you could in the future use the clustering rules to determine the category a new data point falls into.

    This isn't Artificial Intelligence - it's just a data mining/classification technique.

    • by lorinc (2470890)

      SVM are primarily a classification technique that has been extended to clustering, regrssion, structured output learning (such as ranking), and so on. So yes, the max margin principle has been used is basically all the areas of machine learning.

      How do you argue machine learning is not AI? You know the vast majority of researchers and publishers in the ML field consider it to be AI.

      • It depends on what level of (artifical) intelligence you're talking about. If it's amoeba level intelligence, then maybe ML can achieve similar results, but if it's rat or human level intelligence then obviously not.

        I think most people take AI to mean something that could minimally pass a Turing test, not a silicon slug.

    • Re:SVM != AI (Score:5, Informative)

      by tommeke100 (755660) on Sunday July 22, 2012 @01:10PM (#40730463)
      Wrong. SVM is a supervised learning technique. It looks like you're talking about K-means clustering which is unsupervised.
      The difference between supervised and unsupervised is that in the first you use both features and outcome in your training of the system, where the unsupervised will just use the features. So supervised uses both X and Y to learn (if X are the features and Y is the class/cluster), whereas unsupervised will just use X.
      • Re:SVM != AI (Score:5, Informative)

        by bmacs27 (1314285) on Sunday July 22, 2012 @04:43PM (#40731619)
        Right. In many ways SVM is almost the opposite of what OP describes. Rather than grouping things based on their natural clustering, it takes knowledge of the data you wish to separate, and makes it linearly separable. That is, it distorts the data (with the kernel trick) in order to find a way to make the data cluster naturally in the way that you want. In fact, in traditional SVM they explicitly eschew assigning a "similarity metric" over the space, and prefer instead to give only binary classification by comparison to a dividing hyperplane. That's because the space you're working in is such a distortion of your data, it's hopeless to talk about meaningful measures of similarity at all. That could be why it is easy to trick. All you have to do is slightly bias the input along some dimension (X^whatever) which will cause it to deterministically collapse to the desired binary output. Re: Is it AI? Please define I.
  • by fygment (444210) on Sunday July 22, 2012 @11:47AM (#40729971)

    From the article, if you have access to the training data and know the learning algorithm, you can game the machine learning (SVM,not AI) system. How is that anything but self-evident, non-news?!

  • Shhhh.... (Score:3, Interesting)

    by ibsteve2u (1184603) on Sunday July 22, 2012 @12:33PM (#40730177)
    Stop talking about how easy it is to poison data collection efforts; you're going to kill the golden goose of those who insist that analyzing social data can allow you to pinpoint psychopaths [slashdot.org] and other "problematic" individuals before that goose ever takes to the air (on the wings of "black budget" funding, no doubt).
  • A couple of commenters have noted that there is a branch of research related to defending against this - according to one it's called "adversarial machine learning". I've been casually wondering for some time about a related question, which is very relevant to the questions of using the various 'bottom up' AI systems like SVM and neural nets as models of human intelligence and of various complex adaptive systems ('living systems') including economies and polities (and evolutionary biology for that matter).

  • The Texas board of education has a pretty good handle on the minimum amount of poisoned data it takes to affect learning.

  • Cat and Mouse 2.0. Nothing new here.
  • I'm not sure why this would be surprising. ML algorithms work best if the future behaves like the past, if it has the same probability distribution as the training data. Some algorithms can handle slow changes if they can continually get new training data, but large changes is a problem.
  • In other words, artificial intelligence is just as limited and varied as regular ole human intelligence.
    Jeez. Who'd a thunk it?

There's a whole WORLD in a mud puddle! -- Doug Clifford

Working...