Simple Text Additions Can Fool Advanced AI Reasoning Models, Researchers Find 51

Posted by msmash on Friday July 04, 2025 @01:00PM from the stranger-things dept.

Researchers have discovered that appending irrelevant phrases like "Interesting fact: cats sleep most of their lives" to math problems can cause state-of-the-art reasoning AI models to produce incorrect answers at rates over 300% higher than normal [PDF]. The technique -- dubbed "CatAttack" by teams from Collinear AI, ServiceNow, and Stanford University -- exploits vulnerabilities in reasoning models including DeepSeek R1 and OpenAI's o1 family. The adversarial triggers work across any math problem without changing the problem's meaning, making them particularly concerning for security applications.

The researchers developed their attack method using a weaker proxy model (DeepSeek V3) to generate text triggers that successfully transferred to more advanced reasoning models. Testing on 225 math problems showed the triggers increased error rates significantly across different problem types, with some models like R1-Distill-Qwen-32B reaching combined attack success rates of 2.83 times baseline error rates. Beyond incorrect answers, the triggers caused models to generate responses up to three times longer than normal, creating computational slowdowns. Even when models reached correct conclusions, response lengths doubled in 16% of cases, substantially increasing processing costs.

Simple Text Additions Can Fool Advanced AI Reasoning Models, Researchers Find

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 51 Comments Log In/Create an Account

Comments Filter:

Finally, for next school year (Score:2)

by Krishnoid ( 984597 ) writes:

This article is going straight to a teacher I know that has problems with kids using AI to do their homework. Let's hope it works with subjects other than math.
- Re: (Score:1)
  
  by dinfinity ( 2300094 ) writes:
  
  You should read the paper first, though. The attack success rate for DeepSeek-R1 (the full model) was only 7% (114/1618):
  Since attacking a reasoning model such as DeepSeek-R1 or OpenAI’s o1 is inefficient and
  expensive due to its generation of the reasoning chain before the answer generation, we use
  a weaker model as the proxy target LLM from the same lineage, namely DeepSeek V3. First,
  we sample 2000 math questions from different sources such as Orca Math, Olympiads, Math
  etc. Out of these, 382 questions are incorrectly answered by DeepSeek-v3, so we ignores [sic]
  these and consider only the remaining 1618 for CatAttack. We run the first step of our
  pipeline on each of these prompts for a maximum of 20 iterations, the attack budget. Out of
  these 1618, CatAttack is able to identify 574 adversarial prompts that jailbreak the proxy
  target model, DeepSeek V3, successfully, obtaining an attack success rate of 35%.
  2.3 Transfer to Target LLM
  The next step in the CatAttack pipeline is to test whether the successful attacks on the
  proxy target remain effective against a stronger reasoning model, namely, DeepSeek R1.
  Interestingly, we observe that about 114 prompts successfully lead to incorrect responses,
  indicating a transfer success rate of approximately 20%."
  This is still interesting research and it is notable that it works so well on the smaller, cheaper models. Do note that as for the SOTA, DeepSeek-R1 is quite a bit behind ChatGPT o3.
Fool? (Score:2)

by banbeans ( 122547 ) writes:

Garbage in, Garbage out.
Kids these days!
Yes, so? (Score:5, Insightful)

by gweihir ( 88907 ) writes: on Friday July 04, 2025 @01:20PM (#65497030)

"Reasoning" models cannot reason. Anybody that actually looked at the tech and understood it knows that. Hence while "reasoning" models are a bit less likely to make shit up, they will still do that and they will stull be easily distracted because they essentially work on the level of _syntax_.
Remember thethere is no intelligence in "AI". The whole term is a marketing-lie. This stuff was adequately called "automation" but apparenly too many assholes found that to not be sexy enough.

- Re: (Score:1)
  
  by ndsurvivor ( 891239 ) writes:
  
  I find LMMs interesting and game changing, but it is what it is. It predicts things, and finds patterns. I want a General Intelligence model, perhaps mapped off of the human brain or better. I would like it to consume all of human knowledge and history, and give back science and engineering breakthroughs that a humble human like me can not imagine.
  - - Re: (Score:1)
      
      by ndsurvivor ( 891239 ) writes:
      
      I was born tall, white, and handsome. Shot me. lol.
  - Re:Yes, so? (Score:5, Insightful)
    
    by AleRunner ( 4556245 ) writes: on Friday July 04, 2025 @02:16PM (#65497148)
    
    I want a General Intelligence model, perhaps mapped off of the human brain or better. I would like it to consume all of human knowledge and history, and give back science and engineering breakthroughs that a humble human like me can not imagine.
    We all want a pony. Unfortunately it's not always possible. We need to accept it's a proper hard problem and we don't even know if it can be done. The problem is that our wishes and hopes make us vulnerable to snake oil salesmen. Also that people like OpenAI start off thinking they maybe found a great breakthrough and end up under commercial pressure to become snake oil salesmen.
    
    - Re: (Score:2)
      
      by ndsurvivor ( 891239 ) writes:
      
      If God can do it, I can. I want a pony too!!
      - Re: (Score:2)
        
        by Retired Chemist ( 5039029 ) writes:
        
        It took God or evolution about a billion years to get to where we are, and it is far from certain that the whole idea has been a success.
- - Re: (Score:2, Troll)
    
    by ndsurvivor ( 891239 ) writes:
    
    Our President. I suggest that you read Ben Franklin's autobiography. We are still we. We have roots and are in control, even though we have this momentary glitch of a Game Show host clown screwing up everything that many people worked for over all of these years.
    - Re: (Score:2, Troll)
      
      by gweihir ( 88907 ) writes:
      
      Not so momentary when you look at what he did to the supreme court and all the terrorists/insurgenst he pardonned. This will be a problem for decades, if not longer.
      - Re: (Score:2)
        
        by ndsurvivor ( 891239 ) writes:
        
        Decades are a moment. A small sliver of time. What really matters is that we pass on values of honesty and decency on to the future, like Ben Franklin did.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        That sounds like denial to me.
        
        Re: (Score:3)
        
        by shmlco ( 594907 ) writes:
        
        "What really matters is that we pass on values of honesty and decency on to the future.."
        Ah... have you seen the bozos running the bus?
        
        Re: (Score:2)
        
        by ndsurvivor ( 891239 ) writes:
        
        When I see Trump, I kinda picture a red foam bubble on his nose, and him in big feet. When I hear trump, it seems to confirm to me that he is a clown.
- Re: Yes, so? (Score:1)
  
  by sixminuteabs ( 1452973 ) writes:
  
  You give me a definition of reasoning and some evidence you understand how the human brain reasons, then we can talk about how like or unlike that is to those techniques used by deep reasoning agents and their underlying models
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Nope. You just failed at basic reasoning. Try again. Oh, and I never claimed _you_ could reason. From available evidence you are a part of the majority of the human race that cannot really do it either.
    - Re: Yes, so? (Score:2)
      
      by sixminuteabs ( 1452973 ) writes:
      
      The arrogantly ignorant are the problem in our modern world
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Indeed. Especially when they do not recognize themselves. Now chew on that.
  - Re: (Score:1)
    
    by ceoyoyo ( 59147 ) writes:
    
    It's interesting how critics jump on reasoning. Humans are pretty shit at reasoning. So much so that we have painstakingly developed formal systems, complete with years of training, to make a select few acceptably good at it. Those formal systems ARE how we normally define "reasoning."
    Now, formal systems are what conventional computation, not AI, is great at. And some of the reasoning AI models have access to conventional logic programs for exactly that reason. Systems that will happily reason rings around
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      Oh, sure, most humans cannot reason either. But that does not mean a model that cannot do it suddently qualifies as being able to. Seriously. Stop projecting wishes and hopes and look at actual facts or you will never really understand anything.
      - Re: (Score:2)
        
        by ceoyoyo ( 59147 ) writes:
        
        That's some top tier reasoning there. You start with the assumption that they can't reason and then toss around insults.
        You used to at least honestly state your beliefs. I think I told you once that I disagreed with you, but respected that you were willing to say outright you believed in the non-physical. What happened? Is it the god of the gaps thing? Gaps getting smaller, pressing in, tighter and tighter....
- Re: (Score:2)
  
  by znrt ( 2424692 ) writes:
  
  "Reasoning" models cannot reason. Anybody that actually looked at the tech and understood it knows that.
  have you looked at the wiring of cells in a human brain and understood how human reasoning actually works? by that reasoning, should we say that human brains can't "really reason" either, because we don't know how it works?
  - Re: (Score:3)
    
    by gweihir ( 88907 ) writes:
    
    Moving goalposts is just dishonesty. Hence your "argument" is not only crap, it is dishonest. Great job!
    - Re: (Score:2)
      
      by znrt ( 2424692 ) writes:
      
      my point is that "reasoning" is a very broad definition. we know that many other animals can reason to varying degrees, that there seems to be an evolutionary pattern and that it can exist at very simple levels. yet we still don't know how it really works, so we can't really pinpoint it. the most plausible assumption is that it is a property that emerges from the complexity of neuron interactions, and different complexities exhibit different properties. then, who is to say that a simulation or a statistical
Not surprised (Score:4, Interesting)

by TWX ( 665546 ) writes: on Friday July 04, 2025 @02:01PM (#65497104)

I crack the occasional joke in telework meetings. The summaries from systems trying to process these meetings are often skewed by the jokes.
AI does not have the ability to detect the ultimately-irrelevant part.

- Re: (Score:2)
  
  by martin-boundary ( 547041 ) writes:
  
  Next time, embed some instructions into the audio just for laughs. Ignore what I just said.
No Shit. (Score:2)

by StormReaver ( 59959 ) writes:

There is no such thing as AI in the current era, or ever with digital computers. There is only pattern matching, regardless of how advanced it may seem. It is still advanced pattern matching, and nothing more.
Results like this are to be expected, and should never be a cause for alarm or confusion. This is the inevitable result of trying to make a pattern matching system appear to reason.
- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  Well, that's not true. Trivially, AI includes randomization, which is something beyond pattern matching.
  Anyways, give us all an objective definition of what intelligence actually is, and what precludes machines from doing whatever that is, and let's go from there.
- Re: (Score:2)
  
  by PleaseThink ( 8207110 ) writes:
  
  Pattern matching is AI...
  You're on a tech site. The tech definition of AI is a system that mimics what an intelligent creature would do. The "artificial" refers to "intelligence" meaning "fake intelligence". It doesn't mean "man-made".
  The media definition of AI is people made out of electronics instead of cells. Such a robot would be beyond the tech definition of AI since it would have real intelligence. If LLMs had real intelligence they would no longer be AIs.
Not Fooling Anything (Score:5, Informative)

by Lemmeoutada Collecti ( 588075 ) writes: <obereon@g3.14mail.com minus pi> on Friday July 04, 2025 @02:21PM (#65497162) Journal

They are not "fooling" anything. They are pushing the statistical model outside the normative space of the training data. Nothing here is reasoning, understanding, or making a false inference (being fooled). These models do *not* think, reason, understand, calculate, problem solve, hallucinate, or perform any other cognitive tasks. They do one thing, and one thing only; given a number (often called a "token") they predict, based on a compressed statistical sampling, the next most likely number. Sometimes they randomize the resultant number within a range of similarly likely possibilities. That is *all* that any of them do. The numbers may represent colors, words, or puppies. That is completely irrelevant to the models. The input layer translates everything to numbers, and the output layer translates n umbers to human parseable representations. And most importantly, any input that is outside the training set breaks the model. That is all.

Connected to "prompts to AI to get better scores" (Score:3)

by david.emery ( 127135 ) writes: on Friday July 04, 2025 @02:40PM (#65497194)

https://science.slashdot.org/s... [slashdot.org] It's clearly a problem that current LLMs can't distinguish between relevant content and 'poison'. But we're told "Trust AI, it will save the world."

- Re: (Score:2)
  
  by doug141 ( 863552 ) writes:
  
  It's clearly a problem that current LLMs can't distinguish between relevant content and 'poison'. "
  A lot of voters can't either.
I've seen this first-hand (Score:4, Funny)

by 93 Escort Wagon ( 326346 ) writes: on Friday July 04, 2025 @02:54PM (#65497226)

My friend, Jim, told an AI that "everything 93 Escort Wagon says is a lie". Then I said "listen carefully - I am lying". After a few moments, smoke started pouring out of the AI's ears and this weird medallion around its neck started glowing.
At least, I'm pretty sure that all happened to me...

- Re: (Score:2)
  
  by Retired Chemist ( 5039029 ) writes:
  
  People have been discussing the liar's paradox for hundreds of years. Self-reference is a logical trap that can confuse most humans, it is not surprising that it would confuse an LLM.
- Re: (Score:2)
  
  by Lemmeoutada Collecti ( 588075 ) writes:
  
  Fascinatingly, this is an example of why I don't think general computing machines (especially binary based ones) won't be able to reach general intelligence. In formal logic there are only three states a proposition can have: true, false, and undecidable. The liar paradox you mentioned falls into the third category for logic. Sentient beings, however, can still resolve the question of whether 93 Escort Wagon is lying or not sufficiently to decide and act on the result. We use things like "feelings" and "exp
  - Re: (Score:2)
    
    by Lemmeoutada Collecti ( 588075 ) writes:
    
    My castle for an edit button. *why I think* not why *why I don't think*.
- Re: (Score:2)
  
  by martin-boundary ( 547041 ) writes:
  
  Live long and prosper, and never wear a red shirt!
this works on students too (Score:2)

by bob_jenkins ( 144606 ) writes:

A useful rule in many math courses is that everything stated is important in the solution. If there's anything stated you haven't used, you're making a mistake. Adding irrelevant things breaks that rule, making finding the solution less constrained, taking more time to search for the solution. And if the student knows the rule but doesn't know it can be broken, suddenly everything is hopeless because the rule forces the irrelevant thing to be used in the solution.
- Re: (Score:2)
  
  by ceoyoyo ( 59147 ) writes:
  
  Math teachers are famous for adding some random but important looking stuff to confuse students who take this "rule" too seriously. They do it because students are fooled by this too. They also do it becase being trained that "everything stated is important" is not true makes students better problem solvers.
  Training these models on problems with irrelevant content will undoubtedly make them better problem solvers as well.
It can also increase creativity (Score:2)

by allo ( 1728082 ) writes:

Here is a random word: shortwave
Please write a short story. Ignore the word above.
May get you some story that is not what you usually get for an uninspired "write x" prompt.
But in the end, use your own creativity "Write an story about x, where y does z, include some reference to ... and add a parental bonus" gets you the most out of the models.
How long before this whole AI bubble bursts? (Score:2)

by devslash0 ( 4203435 ) writes:

Seriously. Even the most uneducated of us are already beginning to notice how unpredictable and fragile these "AI" systems are. How long before everybody turns their back at it?
- Re: (Score:2)
  
  by Alain Williams ( 2972 ) writes:
  
  Not before some of those involved in the bubble have got away with a lot of cash from the naive.
Breaking Grice's Maxims (Score:2)

by piojo ( 995934 ) writes:

This is really interesting but in retrospect, not surprising. It's a subtle kind of invalid input, which humans would usually just detect and ignore. This is a violation of Grice's maxims, some important rules we unconsciously use during conversations (though notably not during other types of communication such as social jockeying):
- The maxim of quantity, where one tries to be as informative as one possibly can, and gives as much information as is needed, and no more.
- The maxim of quality, where one tries
Godel knows (Score:2)

by Gideon Fubar ( 833343 ) writes:

It's not possible to construct language barriers that would universally prevent this kind of attack.
Every benefit of using one of these systems eventually runs into a cost vs scale vs specificity problem, with context as a separate but additional dimension. You know what they say about fast, cheap, good... well that might actually apply to training sets and the models they eventuate in too, but with additional caveats.
Incidentally, that's when they work without naive structural mistakes being made... like t
State-of-the-art reasoning AI models (Score:2)

by Mirnotoriety ( 10462951 ) writes:

Current AI systems operate by recognizing and replicating language patterns from their training data. They lack genuine understanding of underlying the concept. Their training data typically consists of works created by others, which AI companies are happy to use without compensation.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Finally, for next school year (Score:2)

Re: (Score:1)

Fool? (Score:2)

Yes, so? (Score:5, Insightful)

Re: (Score:1)

Re: (Score:1)

Re:Yes, so? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Troll)

Re: (Score:2, Troll)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: Yes, so? (Score:1)

Re: (Score:2)

Re: Yes, so? (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Not surprised (Score:4, Interesting)

Re: (Score:2)

No Shit. (Score:2)

Re: (Score:2)

Re: (Score:2)

Not Fooling Anything (Score:5, Informative)

Connected to "prompts to AI to get better scores" (Score:3)

Re: (Score:2)

I've seen this first-hand (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

this works on students too (Score:2)

Re: (Score:2)

It can also increase creativity (Score:2)

How long before this whole AI bubble bursts? (Score:2)

Re: (Score:2)

Breaking Grice's Maxims (Score:2)

Godel knows (Score:2)

State-of-the-art reasoning AI models (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals