Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
AI Technology

Simple Text Additions Can Fool Advanced AI Reasoning Models, Researchers Find 47

Researchers have discovered that appending irrelevant phrases like "Interesting fact: cats sleep most of their lives" to math problems can cause state-of-the-art reasoning AI models to produce incorrect answers at rates over 300% higher than normal [PDF]. The technique -- dubbed "CatAttack" by teams from Collinear AI, ServiceNow, and Stanford University -- exploits vulnerabilities in reasoning models including DeepSeek R1 and OpenAI's o1 family. The adversarial triggers work across any math problem without changing the problem's meaning, making them particularly concerning for security applications.

The researchers developed their attack method using a weaker proxy model (DeepSeek V3) to generate text triggers that successfully transferred to more advanced reasoning models. Testing on 225 math problems showed the triggers increased error rates significantly across different problem types, with some models like R1-Distill-Qwen-32B reaching combined attack success rates of 2.83 times baseline error rates. Beyond incorrect answers, the triggers caused models to generate responses up to three times longer than normal, creating computational slowdowns. Even when models reached correct conclusions, response lengths doubled in 16% of cases, substantially increasing processing costs.

Simple Text Additions Can Fool Advanced AI Reasoning Models, Researchers Find

Comments Filter:
  • This article is going straight to a teacher I know that has problems with kids using AI to do their homework. Let's hope it works with subjects other than math.
    • You should read the paper first, though. The attack success rate for DeepSeek-R1 (the full model) was only 7% (114/1618):

      Since attacking a reasoning model such as DeepSeek-R1 or OpenAI’s o1 is inefficient and
      expensive due to its generation of the reasoning chain before the answer generation, we use
      a weaker model as the proxy target LLM from the same lineage, namely DeepSeek V3. First,
      we sample 2000 math questions from different sources such as Orca Math, Olympiads, Math
      etc. Out of these, 382 questions are incorrectly answered by DeepSeek-v3, so we ignores [sic]
      these and consider only the remaining 1618 for CatAttack. We run the first step of our
      pipeline on each of these prompts for a maximum of 20 iterations, the attack budget. Out of
      these 1618, CatAttack is able to identify 574 adversarial prompts that jailbreak the proxy
      target model, DeepSeek V3, successfully, obtaining an attack success rate of 35%.

      2.3 Transfer to Target LLM

      The next step in the CatAttack pipeline is to test whether the successful attacks on the
      proxy target remain effective against a stronger reasoning model, namely, DeepSeek R1.
      Interestingly, we observe that about 114 prompts successfully lead to incorrect responses,
      indicating a transfer success rate of approximately 20%."

      This is still interesting research and it is notable that it works so well on the smaller, cheaper models. Do note that as for the SOTA, DeepSeek-R1 is quite a bit behind ChatGPT o3.

  • Garbage in, Garbage out.
    Kids these days!

  • Yes, so? (Score:5, Insightful)

    by gweihir ( 88907 ) on Friday July 04, 2025 @01:20PM (#65497030)

    "Reasoning" models cannot reason. Anybody that actually looked at the tech and understood it knows that. Hence while "reasoning" models are a bit less likely to make shit up, they will still do that and they will stull be easily distracted because they essentially work on the level of _syntax_.

    Remember thethere is no intelligence in "AI". The whole term is a marketing-lie. This stuff was adequately called "automation" but apparenly too many assholes found that to not be sexy enough.

    • I find LMMs interesting and game changing, but it is what it is. It predicts things, and finds patterns. I want a General Intelligence model, perhaps mapped off of the human brain or better. I would like it to consume all of human knowledge and history, and give back science and engineering breakthroughs that a humble human like me can not imagine.
      • Re:Yes, so? (Score:4, Interesting)

        by AleRunner ( 4556245 ) on Friday July 04, 2025 @02:16PM (#65497148)

        I want a General Intelligence model, perhaps mapped off of the human brain or better. I would like it to consume all of human knowledge and history, and give back science and engineering breakthroughs that a humble human like me can not imagine.

        We all want a pony. Unfortunately it's not always possible. We need to accept it's a proper hard problem and we don't even know if it can be done. The problem is that our wishes and hopes make us vulnerable to snake oil salesmen. Also that people like OpenAI start off thinking they maybe found a great breakthrough and end up under commercial pressure to become snake oil salesmen.

    • You give me a definition of reasoning and some evidence you understand how the human brain reasons, then we can talk about how like or unlike that is to those techniques used by deep reasoning agents and their underlying models
      • by gweihir ( 88907 )

        Nope. You just failed at basic reasoning. Try again. Oh, and I never claimed _you_ could reason. From available evidence you are a part of the majority of the human race that cannot really do it either.

      • by ceoyoyo ( 59147 )

        It's interesting how critics jump on reasoning. Humans are pretty shit at reasoning. So much so that we have painstakingly developed formal systems, complete with years of training, to make a select few acceptably good at it. Those formal systems ARE how we normally define "reasoning."

        Now, formal systems are what conventional computation, not AI, is great at. And some of the reasoning AI models have access to conventional logic programs for exactly that reason. Systems that will happily reason rings around

        • by gweihir ( 88907 )

          Oh, sure, most humans cannot reason either. But that does not mean a model that cannot do it suddently qualifies as being able to. Seriously. Stop projecting wishes and hopes and look at actual facts or you will never really understand anything.

    • by znrt ( 2424692 )

      "Reasoning" models cannot reason. Anybody that actually looked at the tech and understood it knows that.

      have you looked at the wiring of cells in a human brain and understood how human reasoning actually works? by that reasoning, should we say that human brains can't "really reason" either, because we don't know how it works?

      • by gweihir ( 88907 )

        Moving goalposts is just dishonesty. Hence your "argument" is not only crap, it is dishonest. Great job!

        • by znrt ( 2424692 )

          my point is that "reasoning" is a very broad definition. we know that many other animals can reason to varying degrees, that there seems to be an evolutionary pattern and that it can exist at very simple levels. yet we still don't know how it really works, so we can't really pinpoint it. the most plausible assumption is that it is a property that emerges from the complexity of neuron interactions, and different complexities exhibit different properties. then, who is to say that a simulation or a statistical

  • by TWX ( 665546 ) on Friday July 04, 2025 @02:01PM (#65497104)

    I crack the occasional joke in telework meetings. The summaries from systems trying to process these meetings are often skewed by the jokes.

    AI does not have the ability to detect the ultimately-irrelevant part.

  • There is no such thing as AI in the current era, or ever with digital computers. There is only pattern matching, regardless of how advanced it may seem. It is still advanced pattern matching, and nothing more.

    Results like this are to be expected, and should never be a cause for alarm or confusion. This is the inevitable result of trying to make a pattern matching system appear to reason.

    • Well, that's not true. Trivially, AI includes randomization, which is something beyond pattern matching.

      Anyways, give us all an objective definition of what intelligence actually is, and what precludes machines from doing whatever that is, and let's go from there.

  • Not Fooling Anything (Score:5, Informative)

    by Lemmeoutada Collecti ( 588075 ) <obereon@gma[ ]com ['il.' in gap]> on Friday July 04, 2025 @02:21PM (#65497162) Journal
    They are not "fooling" anything. They are pushing the statistical model outside the normative space of the training data. Nothing here is reasoning, understanding, or making a false inference (being fooled). These models do *not* think, reason, understand, calculate, problem solve, hallucinate, or perform any other cognitive tasks. They do one thing, and one thing only; given a number (often called a "token") they predict, based on a compressed statistical sampling, the next most likely number. Sometimes they randomize the resultant number within a range of similarly likely possibilities. That is *all* that any of them do. The numbers may represent colors, words, or puppies. That is completely irrelevant to the models. The input layer translates everything to numbers, and the output layer translates n umbers to human parseable representations. And most importantly, any input that is outside the training set breaks the model. That is all.
  • https://science.slashdot.org/s... [slashdot.org] It's clearly a problem that current LLMs can't distinguish between relevant content and 'poison'. But we're told "Trust AI, it will save the world."

    • by doug141 ( 863552 )

      It's clearly a problem that current LLMs can't distinguish between relevant content and 'poison'. "

      A lot of voters can't either.

  • by 93 Escort Wagon ( 326346 ) on Friday July 04, 2025 @02:54PM (#65497226)

    My friend, Jim, told an AI that "everything 93 Escort Wagon says is a lie". Then I said "listen carefully - I am lying". After a few moments, smoke started pouring out of the AI's ears and this weird medallion around its neck started glowing.

    At least, I'm pretty sure that all happened to me...

    • People have been discussing the liar's paradox for hundreds of years. Self-reference is a logical trap that can confuse most humans, it is not surprising that it would confuse an LLM.
    • Fascinatingly, this is an example of why I don't think general computing machines (especially binary based ones) won't be able to reach general intelligence. In formal logic there are only three states a proposition can have: true, false, and undecidable. The liar paradox you mentioned falls into the third category for logic. Sentient beings, however, can still resolve the question of whether 93 Escort Wagon is lying or not sufficiently to decide and act on the result. We use things like "feelings" and "exp
    • Live long and prosper, and never wear a red shirt!
  • A useful rule in many math courses is that everything stated is important in the solution. If there's anything stated you haven't used, you're making a mistake. Adding irrelevant things breaks that rule, making finding the solution less constrained, taking more time to search for the solution. And if the student knows the rule but doesn't know it can be broken, suddenly everything is hopeless because the rule forces the irrelevant thing to be used in the solution.

    • by ceoyoyo ( 59147 )

      Math teachers are famous for adding some random but important looking stuff to confuse students who take this "rule" too seriously. They do it because students are fooled by this too. They also do it becase being trained that "everything stated is important" is not true makes students better problem solvers.

      Training these models on problems with irrelevant content will undoubtedly make them better problem solvers as well.

  • Here is a random word: shortwave
    Please write a short story. Ignore the word above.

    May get you some story that is not what you usually get for an uninspired "write x" prompt.
    But in the end, use your own creativity "Write an story about x, where y does z, include some reference to ... and add a parental bonus" gets you the most out of the models.

  • Seriously. Even the most uneducated of us are already beginning to notice how unpredictable and fragile these "AI" systems are. How long before everybody turns their back at it?

  • This is really interesting but in retrospect, not surprising. It's a subtle kind of invalid input, which humans would usually just detect and ignore. This is a violation of Grice's maxims, some important rules we unconsciously use during conversations (though notably not during other types of communication such as social jockeying):

    - The maxim of quantity, where one tries to be as informative as one possibly can, and gives as much information as is needed, and no more.
    - The maxim of quality, where one tries

  • It's not possible to construct language barriers that would universally prevent this kind of attack.

    Every benefit of using one of these systems eventually runs into a cost vs scale vs specificity problem, with context as a separate but additional dimension. You know what they say about fast, cheap, good... well that might actually apply to training sets and the models they eventuate in too, but with additional caveats.

    Incidentally, that's when they work without naive structural mistakes being made... like t

  • Current AI systems operate by recognizing and replicating language patterns from their training data. They lack genuine understanding of underlying the concept. Their training data typically consists of works created by others, which AI companies are happy to use without compensation.

"Stupidity, like virtue, is its own reward" -- William E. Davidsen

Working...