

Simple Text Additions Can Fool Advanced AI Reasoning Models, Researchers Find 47
Researchers have discovered that appending irrelevant phrases like "Interesting fact: cats sleep most of their lives" to math problems can cause state-of-the-art reasoning AI models to produce incorrect answers at rates over 300% higher than normal [PDF]. The technique -- dubbed "CatAttack" by teams from Collinear AI, ServiceNow, and Stanford University -- exploits vulnerabilities in reasoning models including DeepSeek R1 and OpenAI's o1 family. The adversarial triggers work across any math problem without changing the problem's meaning, making them particularly concerning for security applications.
The researchers developed their attack method using a weaker proxy model (DeepSeek V3) to generate text triggers that successfully transferred to more advanced reasoning models. Testing on 225 math problems showed the triggers increased error rates significantly across different problem types, with some models like R1-Distill-Qwen-32B reaching combined attack success rates of 2.83 times baseline error rates. Beyond incorrect answers, the triggers caused models to generate responses up to three times longer than normal, creating computational slowdowns. Even when models reached correct conclusions, response lengths doubled in 16% of cases, substantially increasing processing costs.
The researchers developed their attack method using a weaker proxy model (DeepSeek V3) to generate text triggers that successfully transferred to more advanced reasoning models. Testing on 225 math problems showed the triggers increased error rates significantly across different problem types, with some models like R1-Distill-Qwen-32B reaching combined attack success rates of 2.83 times baseline error rates. Beyond incorrect answers, the triggers caused models to generate responses up to three times longer than normal, creating computational slowdowns. Even when models reached correct conclusions, response lengths doubled in 16% of cases, substantially increasing processing costs.
Finally, for next school year (Score:2)
Re: (Score:1)
You should read the paper first, though. The attack success rate for DeepSeek-R1 (the full model) was only 7% (114/1618):
Since attacking a reasoning model such as DeepSeek-R1 or OpenAI’s o1 is inefficient and
expensive due to its generation of the reasoning chain before the answer generation, we use
a weaker model as the proxy target LLM from the same lineage, namely DeepSeek V3. First,
we sample 2000 math questions from different sources such as Orca Math, Olympiads, Math
etc. Out of these, 382 questions are incorrectly answered by DeepSeek-v3, so we ignores [sic]
these and consider only the remaining 1618 for CatAttack. We run the first step of our
pipeline on each of these prompts for a maximum of 20 iterations, the attack budget. Out of
these 1618, CatAttack is able to identify 574 adversarial prompts that jailbreak the proxy
target model, DeepSeek V3, successfully, obtaining an attack success rate of 35%.
2.3 Transfer to Target LLM
The next step in the CatAttack pipeline is to test whether the successful attacks on the
proxy target remain effective against a stronger reasoning model, namely, DeepSeek R1.
Interestingly, we observe that about 114 prompts successfully lead to incorrect responses,
indicating a transfer success rate of approximately 20%."
This is still interesting research and it is notable that it works so well on the smaller, cheaper models. Do note that as for the SOTA, DeepSeek-R1 is quite a bit behind ChatGPT o3.
Fool? (Score:2)
Garbage in, Garbage out.
Kids these days!
Yes, so? (Score:5, Insightful)
"Reasoning" models cannot reason. Anybody that actually looked at the tech and understood it knows that. Hence while "reasoning" models are a bit less likely to make shit up, they will still do that and they will stull be easily distracted because they essentially work on the level of _syntax_.
Remember thethere is no intelligence in "AI". The whole term is a marketing-lie. This stuff was adequately called "automation" but apparenly too many assholes found that to not be sexy enough.
Re: (Score:1)
Re: (Score:1)
Re:Yes, so? (Score:4, Interesting)
I want a General Intelligence model, perhaps mapped off of the human brain or better. I would like it to consume all of human knowledge and history, and give back science and engineering breakthroughs that a humble human like me can not imagine.
We all want a pony. Unfortunately it's not always possible. We need to accept it's a proper hard problem and we don't even know if it can be done. The problem is that our wishes and hopes make us vulnerable to snake oil salesmen. Also that people like OpenAI start off thinking they maybe found a great breakthrough and end up under commercial pressure to become snake oil salesmen.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2, Troll)
Re: (Score:2, Troll)
Not so momentary when you look at what he did to the supreme court and all the terrorists/insurgenst he pardonned. This will be a problem for decades, if not longer.
Re: (Score:2)
Re: (Score:2)
That sounds like denial to me.
Re: Yes, so? (Score:1)
Re: (Score:2)
Nope. You just failed at basic reasoning. Try again. Oh, and I never claimed _you_ could reason. From available evidence you are a part of the majority of the human race that cannot really do it either.
Re: Yes, so? (Score:2)
Re: (Score:2)
Indeed. Especially when they do not recognize themselves. Now chew on that.
Re: (Score:1)
It's interesting how critics jump on reasoning. Humans are pretty shit at reasoning. So much so that we have painstakingly developed formal systems, complete with years of training, to make a select few acceptably good at it. Those formal systems ARE how we normally define "reasoning."
Now, formal systems are what conventional computation, not AI, is great at. And some of the reasoning AI models have access to conventional logic programs for exactly that reason. Systems that will happily reason rings around
Re: (Score:2)
Oh, sure, most humans cannot reason either. But that does not mean a model that cannot do it suddently qualifies as being able to. Seriously. Stop projecting wishes and hopes and look at actual facts or you will never really understand anything.
Re: (Score:2)
"Reasoning" models cannot reason. Anybody that actually looked at the tech and understood it knows that.
have you looked at the wiring of cells in a human brain and understood how human reasoning actually works? by that reasoning, should we say that human brains can't "really reason" either, because we don't know how it works?
Re: (Score:3)
Moving goalposts is just dishonesty. Hence your "argument" is not only crap, it is dishonest. Great job!
Re: (Score:2)
my point is that "reasoning" is a very broad definition. we know that many other animals can reason to varying degrees, that there seems to be an evolutionary pattern and that it can exist at very simple levels. yet we still don't know how it really works, so we can't really pinpoint it. the most plausible assumption is that it is a property that emerges from the complexity of neuron interactions, and different complexities exhibit different properties. then, who is to say that a simulation or a statistical
Not surprised (Score:3)
I crack the occasional joke in telework meetings. The summaries from systems trying to process these meetings are often skewed by the jokes.
AI does not have the ability to detect the ultimately-irrelevant part.
Re: (Score:2)
No Shit. (Score:2)
There is no such thing as AI in the current era, or ever with digital computers. There is only pattern matching, regardless of how advanced it may seem. It is still advanced pattern matching, and nothing more.
Results like this are to be expected, and should never be a cause for alarm or confusion. This is the inevitable result of trying to make a pattern matching system appear to reason.
Re: (Score:2)
Anyways, give us all an objective definition of what intelligence actually is, and what precludes machines from doing whatever that is, and let's go from there.
Not Fooling Anything (Score:5, Informative)
Connected to "prompts to AI to get better scores" (Score:3)
https://science.slashdot.org/s... [slashdot.org] It's clearly a problem that current LLMs can't distinguish between relevant content and 'poison'. But we're told "Trust AI, it will save the world."
Re: (Score:2)
It's clearly a problem that current LLMs can't distinguish between relevant content and 'poison'. "
A lot of voters can't either.
I've seen this first-hand (Score:4, Funny)
My friend, Jim, told an AI that "everything 93 Escort Wagon says is a lie". Then I said "listen carefully - I am lying". After a few moments, smoke started pouring out of the AI's ears and this weird medallion around its neck started glowing.
At least, I'm pretty sure that all happened to me...
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
this works on students too (Score:2)
A useful rule in many math courses is that everything stated is important in the solution. If there's anything stated you haven't used, you're making a mistake. Adding irrelevant things breaks that rule, making finding the solution less constrained, taking more time to search for the solution. And if the student knows the rule but doesn't know it can be broken, suddenly everything is hopeless because the rule forces the irrelevant thing to be used in the solution.
Re: (Score:2)
Math teachers are famous for adding some random but important looking stuff to confuse students who take this "rule" too seriously. They do it because students are fooled by this too. They also do it becase being trained that "everything stated is important" is not true makes students better problem solvers.
Training these models on problems with irrelevant content will undoubtedly make them better problem solvers as well.
It can also increase creativity (Score:2)
Here is a random word: shortwave
Please write a short story. Ignore the word above.
May get you some story that is not what you usually get for an uninspired "write x" prompt. ... and add a parental bonus" gets you the most out of the models.
But in the end, use your own creativity "Write an story about x, where y does z, include some reference to
How long before this whole AI bubble bursts? (Score:2)
Seriously. Even the most uneducated of us are already beginning to notice how unpredictable and fragile these "AI" systems are. How long before everybody turns their back at it?
Re: (Score:2)
Not before some of those involved in the bubble have got away with a lot of cash from the naive.
Breaking Grice's Maxims (Score:2)
This is really interesting but in retrospect, not surprising. It's a subtle kind of invalid input, which humans would usually just detect and ignore. This is a violation of Grice's maxims, some important rules we unconsciously use during conversations (though notably not during other types of communication such as social jockeying):
- The maxim of quantity, where one tries to be as informative as one possibly can, and gives as much information as is needed, and no more.
- The maxim of quality, where one tries
Godel knows (Score:2)
It's not possible to construct language barriers that would universally prevent this kind of attack.
Every benefit of using one of these systems eventually runs into a cost vs scale vs specificity problem, with context as a separate but additional dimension. You know what they say about fast, cheap, good... well that might actually apply to training sets and the models they eventuate in too, but with additional caveats.
Incidentally, that's when they work without naive structural mistakes being made... like t
State-of-the-art reasoning AI models (Score:2)