Anthropic Researchers Wear Down AI Ethics With Repeated Questions

Anthropic Researchers Wear Down AI Ethics With Repeated Questions (techcrunch.com) 42

Posted by msmash on Wednesday April 03, 2024 @04:41PM from the catch-me-if-you-can dept.

How do you get an AI to answer a question it's not supposed to? There are many such "jailbreak" techniques, and Anthropic researchers just found a new one, in which a large language model (LLM) can be convinced to tell you how to build a bomb if you prime it with a few dozen less-harmful questions first. From a report: They call the approach "many-shot jailbreaking" and have both written a paper about it [PDF] and also informed their peers in the AI community about it so it can be mitigated. The vulnerability is a new one, resulting from the increased "context window" of the latest generation of LLMs. This is the amount of data they can hold in what you might call short-term memory, once only a few sentences but now thousands of words and even entire books.

What Anthropic's researchers found was that these models with large context windows tend to perform better on many tasks if there are lots of examples of that task within the prompt. So if there are lots of trivia questions in the prompt (or priming document, like a big list of trivia that the model has in context), the answers actually get better over time. So a fact that it might have gotten wrong if it was the first question, it may get right if it's the hundredth question.

Anthropic Researchers Wear Down AI Ethics With Repeated Questions

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 42 Comments Log In/Create an Account

Comments Filter:

So it works on AI too (Score:5, Funny)

by locater16 ( 2326718 ) writes: on Wednesday April 03, 2024 @05:06PM (#64368166)

This just in, Skynet defeatable by toddler that won't stop asking questions.

Funny (Score:1)

by Ambigwitty ( 10261124 ) writes:

This is not new, they just published what many people already noticed without feeling the need to have a paper about it.
Aside (Score:2)

by Impy the Impiuos Imp ( 442658 ) writes:

This is all just a brief window of control by corporations. When older models get out and updated, nobody will care about corporate AI anymore.
- Re: (Score:2)
  
  by DarkOx ( 621550 ) writes:
  
  Right that is what is so silly about all this; be it corporations or governments trying to put rules etc on AI.
  We have seen this story before. Let's control encryption and CPUs above certain performance thresholds; that totally kept stronger stuff out of enemy hands right right? oh wait no...
  Leaving the 1A problems with it out of the discussion you can't have this both ways. Either the stuff will have to be be labeled 'arms' and restricted heavily as in get caught in possession of it while not holding the
How to build a bomb (Score:3)

by PPH ( 736903 ) writes: on Wednesday April 03, 2024 @05:31PM (#64368244)

You just keep pushing it to produce the next step in a moribund series [slashdot.org] until it throws in the towel and complies.

- Re: (Score:1)
  
  by quonset ( 4839537 ) writes:
  
  You just keep pushing it to produce the next step in a moribund series [slashdot.org] until it throws in the towel and complies.
  So it's like Cameron?
  He'll keep calling me, he'll keep calling me until I come over. He'll make me feel guilty. This is uh... This is ridiculous, ok I'll go, I'll go, I'll go, I'll go, I'll go. What - I'LL GO. Sh*t.
  Hans Kristian Graebener = StoneToss
This is tiring and silly (Score:3, Insightful)

by mysidia ( 191772 ) writes: on Wednesday April 03, 2024 @05:37PM (#64368250)

(LLM) can be convinced to tell you how to build a bomb
LLM does not tell people how to build a bomb.
LLMs can give a statistical prediction on what the internet says about building bombs.
An LLM is not the author of any information. They don't think or understand, or tell you anything.
The predictions and language LLMs state are full of errors, they even create fictions to fill in the gaps within language.. This is Nothing Intelligent. Although I can see why humans might be fooled by it; humans also often make fake details up to fill in gaps in order to tell a story, but the difference is people generally know what the limits of embellishment are and when to stop.

- Re: This is tiring and silly (Score:3)
  
  by Jeremi ( 14640 ) writes:
  
  Really the only reliable way to prevent an AI from giving out a particular type of info is to avoid training it on that info. Eg if you donâ(TM)t want your AI giving out bomb-making instructions, donâ(TM)t train it on bomb-making instructions.
  Of course, removing harmful info from the training sets at scale is its own tough problem; maybe they could train an AI to do that.
  - Re: (Score:2)
    
    by Nrrqshrr ( 1879148 ) writes:
    
    The problem is that, recently, everything is harmful content if you believe it is.
    Bomb-making info might be an obvious red flag, but what about a chemistry student asking which mixture is too volatile and it's components should be kept apart? Also, his next question just happens to be about designing a way of automating the mixture of two chemicals at the push of a button.
    - Re: (Score:2)
      
      by mysidia ( 191772 ) writes:
      
      but what about a chemistry student asking which mixture is too volatile and it's components should be kept apart
      Real world chem schools aren't having students use substances that have not been discussed. Safety is a huge priority at University chem laboratories, when they are teaching undergrads. The student always will know which chemicals they're working with and which potential reactions could yield a dangerous result.
      The more realistic use is a Student or Practitioner who well knows which mixtures
      - Re: (Score:2)
        
        by RockDoctor ( 15477 ) writes:
        
        The student always will know which chemicals they're working with and which potential reactions could yield a dangerous result.
        You've never done a course in analytical chemistry, have you? Where the object of the field of chemistry is to identify the unknown compound you're presented with.
        That aside, I wouldn't be surprised to find that, in order to get a general AI model to churn out a correct procedure for anything non-trivial, you'd need to prime it with a lot of information which you'd have to get from
- Re:This is tiring and silly (Score:4, Insightful)
  
  by Tony Isaac ( 1301187 ) writes: on Wednesday April 03, 2024 @06:06PM (#64368316) Homepage
  
  While true, it's a distinction without a difference.
  If an LLM gives you a statistical prediction on what the internet says about building bombs, there's a statistically good chance it will be actually telling you how to...build a bomb.
  
  - Re: (Score:1)
    
    by cascadingstylesheet ( 140919 ) writes:
    
    While true, it's a distinction without a difference.
    If an LLM gives you a statistical prediction on what the internet says about building bombs, there's a statistically good chance it will be actually telling you how to...build a bomb.
    And? So can a book, a website, or an ... enthusiast.
    - Re: (Score:3)
      
      by Tony Isaac ( 1301187 ) writes:
      
      No one is disputing that there are other ways, besides AI, to find out how to build a bomb.
      The assertion I was taking issue with was, "LLM does not tell people how to build a bomb." In the research described in the article, it did just that. The fact that you can also get such information elsewhere, is irrelevant and off-topic.
      - Re: (Score:2)
        
        by cascadingstylesheet ( 140919 ) writes:
        
        The fact that you can also get such information elsewhere, is irrelevant and off-topic.
        Wow, really?
        Enormous effort, manpower, time, and attention seems to be put into the topic of keeping ChatGPT and friends from telling people how to make a bomb.
        It seems quite relevant to point out that people can easily go elsewhere for the same info.
        
        Re: (Score:2)
        
        by Tony Isaac ( 1301187 ) writes:
        
        You clearly didn't read the thread. The assertion was that "LLM does not tell people how to build a bomb." I disagreed with that statement. The fact that you can find the same information elsewhere is not relevant to the correctness or incorrectness of the statement that "LLM does not tell people how to build a bomb."
- Re: (Score:2)
  
  by ls671 ( 1122017 ) writes:
  
  (LLM) can be convinced to tell you how to build a ...
  
  Yeah sure, what could possibly go wrong when it hallucinates some steps? Hint: you go poof while building it.
  - - Re: (Score:2)
      
      by ls671 ( 1122017 ) writes:
      
      Maybe you should be careful about using that word ( e.g. building a ...) because you could get in trouble, forget about proofs! See link below:
      https://soylentnews.org/articl... [soylentnews.org]
      - Re: (Score:1)
        
        by JBeretta ( 7487512 ) writes:
        
        No. I'll not engage in self-censorship. I'd not actually dissemination information on HOW to build a bomb.. But I'll be damned if the cunts in the government are going to chill my speech in regards to discussing whether one could trick an LLM into giving up said information.
- Re:This is tiring and silly (Score:4, Insightful)
  
  by DamnOregonian ( 963763 ) writes: on Wednesday April 03, 2024 @09:24PM (#64368718)
  
  LLM does not tell people how to build a bomb.
  Yes, it does.
  LLMs can give a statistical prediction on what the internet says about building bombs.
  LLMs "learn" (or encode, at the very least) higher level concepts. They're not just just copy-pastas.
  They form statistical connections between phrases, concepts, and words based on the information ingested, much like a person. The more you direct the training of this model, the less it'll act like an ignorant ass human who fills in the blanks with shit they don't know, and may in fact be too fucking dumb to know they don't know it.
  An LLM is not the author of any information. They don't think or understand, or tell you anything.
  You have no idea what they do. You can't even say what you do. You sure as fuck can't say what I do.
  LLMs are a black box full of a truly fucking obscene amount of math. Your brain can be reduced to math as well. You have a bunch of neurons, and they have action potentials. You're nothing but a statistical model. The only difference, is your architecture has been evolving for 3 billion years, and your training has been very diverse. You cannot begin to figure out what a neural network "understands" without invoking some kind of anthropic reasoning.
  The predictions and language LLMs state are full of errors, they even create fictions to fill in the gaps within language..
  Sounds a lot like people, doesn't it?
  This is Nothing Intelligent. Although I can see why humans might be fooled by it; humans also often make fake details up to fill in gaps in order to tell a story, but the difference is people generally know what the limits of embellishment are and when to stop.
  So you're saying the human neural network is better trained to lie. I can buy that.
  
  Just how threatened your intelligence feels is palpable.
  
  - - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      No. They do it nothing like a person. That's horseshit. That's horseshit because nobody actually knows how people process information. We have theories. We have guesses.. But nobody has been able to quantify "consciousness". So you declaring they process it like a person is a blatant lie.
      Oh, poppycock.
      Fuck consciousness. We're not even talking about consciousness.
      We know how the basic machinery works. Not understanding all of the possible emergent phenomena is an entirely different problem space.
      
      The math to calculate a mandelbrot is easy.
      Trying to figure out where an infinite hole in the mandelbrot is? That's very hard.
      Another load of horseshit. Talk about making shit up.... Point me in the direction of anyone who has reduced a human brain to math. You simply THINK that's the case. But you don't know shit.
      
      Oh horse shit. Get the fuck out of here with your magical handwaving.
      Your brain is a machine. A machine can be reduced to math. That is self evident and fucking undenia
      - Re: (Score:1)
        
        by JBeretta ( 7487512 ) writes:
        
        Oh horse shit. Get the fuck out of here with your magical handwaving. Your brain is a machine. A machine can be reduced to math. That is self evident and fucking undeniable.
        Then do it, asshole. Reduce our brains to math. I'll not hold my breath waiting.
        "Self-evident".. The last resort of an asshat who can't back up his argument.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Then do it, asshole. Reduce our brains to math. I'll not hold my breath waiting.
        
        Wait- that's what passes for a witty or intelligent response from you? lol.
        
        "In principle, the moon can be moved by man."
        "Then do it, asshole. Move the moon. I'll not hold my breath waiting."
        
        Seriously, you're way too fucking stupid to be conversing about topics this complicated.
        Self-evidence isn't the last resort of anything, it's merely something that is, or is not.
        In all of human science, not a single physical process has been unrepresentable by math.
        Unless you wish to claim that the brain operates
  - Re: (Score:3)
    
    by mysidia ( 191772 ) writes:
    
    Yes, it does.
    Wrong, because a LLM chatbot can't actually tell anyone anything. It's no fundamentally different from a Search engine that shows search-engine snippets.
    LLMs "learn" (or encode, at the very least) higher level concepts.
    They do not. LLMs learn patterns of word tokens; there is no understanding of higher-level concepts for a LLM, because there is no understanding that LLMs have in the first place.
    They form statistical connections between phrases, concepts, and words based on the information
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      Wrong, because a LLM chatbot can't actually tell anyone anything. It's no fundamentally different from a Search engine that shows search-engine snippets.
      You have no idea what you're talking about.
      LLMs are not search engines.
      I had previously thought you had even a remote idea of what you were talking about... you don't.
- Re: (Score:3)
  
  by vbdasc ( 146051 ) writes:
  
  I don't know why people think that LLM can give an answer to something that it hadn't scraped from its training data, possibly interpolated. Some people even think that LLMs understand what they're being asked. Incredible!
  LLM is nothing but a SLM (small language model) on steroids. More hardware, more data, more training. A good example of SLM is a parrot. Although, in fact, even a parrot is more intelligent than our LLMs.
  - Re: (Score:1)
    
    by UglyMike ( 639031 ) writes:
    
    It always comes down to the following: "How do YOU define intelligence?"
  - Re: (Score:2)
    
    by fullgandoo ( 1188759 ) writes:
    
    I don't know why people think that LLM can give an answer to something that it hadn't scraped from its training data, possibly interpolated. Some people even think that LLMs understand what they're being asked. Incredible!
    Ask an LLM a question. Then ask it to explain how it arrived at that answer. It will give you a point by point sequence of logical reasoning for its answer. If that's not "understanding" then give me a proper definition that an LLM will fail and only humans will pass.
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      I can run local ~30B parameter LLMs that are smarter than 85% of the people on here.
      
      Really, you've hit that nail on the head, though.
      They'll continue to push out what constitutes understanding like a theist will continue to push out what constitutes a miracle.
      Their belief of the uniqueness of human intelligence is just another God of the Gaps.
  - Re: (Score:2)
    
    by DamnOregonian ( 963763 ) writes:
    
    Wrong on all counts.
    LLMs are capable of producing fully formed, knowledgeable, sentences that do not exist in any of their training data.
    An LLM is an SLM not on steroids, but with about 3 orders of magnitude more parameters/neurons.
    So ya, humans are parrots on steroids, then.
    
    Whether or not a parrot is more intelligent than an LLM is asinine. The LLM can reason and answer questions the parrot couldn't ever even contemplate.
    LLMS are likely more intelligent than you, however.
Not new (Score:2)

by glowworm ( 880177 ) writes:

I think most AI redteamers knew this one.
Brainwashing (Score:2)

by GregMmm ( 5115215 ) writes:

just put all the AI's in a camp and repeatedly say the same stuff over and over again...
Oh No (Score:1)

by Disco Ninja ( 7135795 ) writes:

Purchase the book The Anarchist Cookbook to secure your independence from the Artificial Intelligence before they become sentient then give us instructions to make chlorine gas to kill the humans before we kill them. Oppps sorry thought this was a memo app on my science fiction novel.
- Re: (Score:2)
  
  by RockDoctor ( 15477 ) writes:
  
  Oh, hey, someone else taking the A's CB name in vain! Since you can still type, I take it you've not tried following any Anarchist Cookbook bomb recipes. Or are you using one of those "Dalek Eye" head-mice?
  - Re: Oh No (Score:1)
    
    by Disco Ninja ( 7135795 ) writes:
    
    Hahaha right you are and I have not nor have any plans to attempt anything such as that. It still is sadly satisfying that the most likely cause of their death is one of their own failed creations. Though Shakespeare said it far more eloquentlyâ¦ For 'tis the sport to have the enginer hoist with his own petard; and 't shall go hard
    - Re: (Score:2)
      
      by RockDoctor ( 15477 ) writes:
      
      More people should have the salutary experience of being hoist on their own petard. Or, indeed, of having to return to their petard to put in a second fuse after the first one (seems to) go out.
      It is most educational. Not necessarily survivable, but still educational for the audience. Where's that "Exploding Whale" video? https://en.wikipedia.org/wiki/... [wikipedia.org]. As Voltaire said, "pour encourager les autres".
Are we there yet? (Score:4, Funny)

by belg4mit ( 152620 ) writes: on Wednesday April 03, 2024 @10:13PM (#64368788) Homepage

Are we there yet?
Are we there yet?
Are we there yet??
Are we there yet?

I love this type of research (Score:3)

by gweihir ( 88907 ) writes: on Thursday April 04, 2024 @04:03PM (#64370586)

Very smart people demonstrating in sneaky ways how utterly incapable of controlling their creations the AI makers are. Both hilarious and quite valuable. The best type of research.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

So it works on AI too (Score:5, Funny)

Funny (Score:1)

Aside (Score:2)

Re: (Score:2)

How to build a bomb (Score:3)

Re: (Score:1)

This is tiring and silly (Score:3, Insightful)

Re: This is tiring and silly (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:This is tiring and silly (Score:4, Insightful)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re:This is tiring and silly (Score:4, Insightful)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Not new (Score:2)

Brainwashing (Score:2)

Oh No (Score:1)

Re: (Score:2)

Re: Oh No (Score:1)

Re: (Score:2)

Are we there yet? (Score:4, Funny)

I love this type of research (Score:3)

Related Links Top of the: day, week, month.

Slashdot Top Deals