Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
AI Technology

Researchers Warn Against Treating AI Outputs as Human-Like Reasoning 66

Arizona State University researchers are pushing back [PDF] against the widespread practice of describing AI language models' intermediate text generation as "reasoning" or "thinking," arguing this anthropomorphization creates dangerous misconceptions about how these systems actually work. The research team, led by Subbarao Kambhampati, examined recent "reasoning" models like DeepSeek's R1, which generate lengthy intermediate token sequences before providing final answers to complex problems. Though these models show improved performance and their intermediate outputs often resemble human scratch work, the researchers found little evidence that these tokens represent genuine reasoning processes.

Crucially, the analysis also revealed that models trained on incorrect or semantically meaningless intermediate traces can still maintain or even improve performance compared to those trained on correct reasoning steps. The researchers tested this by training models on deliberately corrupted algorithmic traces and found sustained improvements despite the semantic noise. The paper warns that treating these intermediate outputs as interpretable reasoning traces engenders false confidence in AI capabilities and may mislead both researchers and users about the systems' actual problem-solving mechanisms.

Researchers Warn Against Treating AI Outputs as Human-Like Reasoning

Comments Filter:
  • Duh! (Score:5, Informative)

    by srg33 ( 1095679 ) on Thursday May 29, 2025 @10:26AM (#65413537)
    Many people, including me, having been saying this over and over again. Calling this stuff AI is just not correct. The outputs based on LLM are just unqualified/unverified correlations. Correlation does not equal causation!
    • Re:Duh! (Score:5, Insightful)

      by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Thursday May 29, 2025 @10:31AM (#65413549) Homepage Journal

      Correct. Since it's not thinking, it's not artificial intelligence. It only looks like it's thinking. This is Simulated Intelligence.

      If you are any good at thinking, most of it doesn't look like it's thinking, but it fools plenty of people who aren't. Also, some of the newer LLMs give a really good imitation of thought, because they explain their "logic". But it's ultimately just feeding back on itself in order to do that and it's not-thinking all the way down.

      • Re: (Score:3, Informative)

        by allo ( 1728082 )

        Artificial intelligence is the superset, which includes a lot less "intelligent" things than LLM.
        ELIZA is AI. Expert systems are AI. Markov Chain text generators are AI.
        You just should understand the term as it is scientifically used and not like it is used in sci-fi.

        • You just should understand the term as it is scientifically used and not like it is used in sci-fi.

          The term isn't used in just one way scientifically.
          Even if the usage were consistent, if the term is misleading then it should be changed. Is language there to work for us, or are we there to work for it?

          • "AI", scientifically, is an umbrella term used to encompass multiple fields including LLMs.

            Even if the usage were consistent, if the term is misleading then it should be changed. Is language there to work for us, or are we there to work for it?

            Yeah! And don't get me started on "endless" salad bars and "never ending" stories.

            Language isn't precise. It never has been and isn't supposed to be.

          • by allo ( 1728082 )

            What scientific terms do you want to change next, because someone from marketing is using them?

            • What scientific terms do you want to change next, because someone from marketing is using them?

              It has never had a single precise meaning to anyone.

      • by jhoegl ( 638955 )
        > If you are any good at thinking I mean, thats probably a 33-66% likelihood judging by todays sampling of people on the internet. (Maybe 90% of those are bots, but the people who can tell us that, wont).

        So now that you understand the companies running these single answer to a search engine question UI sites (fake AI), then you understand their end goal. Manipulate, manipulate, manipulate. The answer will differ by next year depending on who owns the interface. We already see tons of examples of t
      • Correct. Since it's not thinking, it's not artificial intelligence. It only looks like it's thinking. This is Simulated Intelligence.

        That's quite a dangerious view once someone starts applying it to you.

        • "That's quite a dangerious view once someone starts applying it to you."

          Sure, people are dangerous and so is misapplication of logic to inappropriate circumstances.

      • I think most human reasoning starts with a conclusion and uses reason to explain and/or test it. This is sort of the basis of the scientific model.

        But the larger issue is choosing the conclusion. Because AI is trained on a "huge amount" of data that is far less than the average person receives each day from our various sensors. The data AI is receiving is assumed to be more significant, but it is hardly representative of the data a typical human has to work with.

        A lot of the descriptions of AI are just ma

        • >I think most human reasoning starts with a conclusion and uses reason to explain and/or test it. This is sort of the basis of the scientific model.

          That's quite interesting. I've definitely heard actual scientists say that. However, would anyone doing science admit that?
          From my own experience, solving problems, I find there's two roads I follow (probably others as well), but, 1 road is what you said. You often take a leap of faith, jump to a conclusion that SEEMS right, that seems appealing, what seems o
          • My understanding of the scientific method is to develop a hypothesis and then attempt to confirm or deny it. Usually by designing an experiment that would produce a result that proved it wasn't true - i,e denying the hypothesis.
        • by jbengt ( 874751 )

          I think most human reasoning starts with a conclusion and uses reason to explain and/or test it.

          Well, several of my bosses have wanted me to do a "study" that would come to the conclusion they wanted. I usually just did an actual study the best I knew how. Sometimes my boss's conclusion would be proved right, sometimes not - sometimes that led to awkwardness in the write-up.

          This is sort of the basis of the scientific model.

          I wouldn't exactly say that a scientific theory is starting with a conclusion, as

      • Re:Duh! (Score:4, Interesting)

        by Bumbul ( 7920730 ) on Thursday May 29, 2025 @12:23PM (#65413833)

        Correct. Since it's not thinking, it's not artificial intelligence. It only looks like it's thinking. This is Simulated Intelligence.

        No need to invent new terms. In english language, the word Artificial has many meanings - and this is why "AI" is such a controversial term.

        It seems that the Artificial in "Artificial intelligence" is commonly understood to mean "man made", but I would argue it originally meant either of these other interpretations: Artificial in "Artificial Smile" means fake. And artificial in "artificial gun sound" means something imitating a gun sound.

      • Since it's not thinking, it's not artificial intelligence. It only looks like it's thinking

        That is in fact the definition of something that is artificial: something that looks or seems similar to the real thing, but isn't. So yes, it *is* "artificial intelligence."

        • That is in fact the definition of something that is artificial: something that looks or seems similar to the real thing, but isn't.

          No, it isn't. It means made or produced by human beings rather than occurring naturally. It doesn't have to be an analogue of anything. The word artifice means "clever or artful skill" or "an ingenious device or expedient" in this context, and the suffix "al" means "of or like". The word means of (from) artifice. This is some pretty basic English stuff — word plus suffix. Not sure why some find this so confusing.

      • Also, some of the newer LLMs give a really good imitation of thought, because they explain their "logic".

        Humans do this all the time. It's called rationalization.

    • Re:Duh! (Score:4, Insightful)

      by avandesande ( 143899 ) on Thursday May 29, 2025 @10:36AM (#65413569) Journal
      Really though with the internet a majority of output from humans is exactly the same- regurgitating anecdotes or facts they have read somewhere else with little or no understanding.
      • And its important to remember that the next token the LLM generates is what it is because that was the most common or popular response in its training material. Not necessarily the right answer. Logically, truth of statements is not able to be determined, unless something is logically always true. For example, an if x then y statement is always true regardless whether the components are true. But a simple statement like "the sky is blue" is not able to be determined to be true (in logic at least).

        Consider

        • For example, an if x then y statement is always true regardless whether the components are true.

          Nonsense. If x is true and y is false, then "if x then y" is false.

          • by HiThere ( 15173 )

            That's true in most symbolic logic systems, but not all of them.
            E.g. Bayesian systems don't actually have true or false, but only degrees of probability. Additionally some systems demand (or at least attempt to demand) a causal connection between x and y for "if x then y" to even have a meaningful interpretation. Mere consistent truth values isn't sufficient.

            1st order prepositional calculus isn't the only logical system.

            • That's true in most symbolic logic systems, but not all of them.

              Agreed. But even then the statement "For example, an if x then y statement is always true regardless whether the components are true." remains false. Alternative logics just follows different rules (which are usually extensions of the rules of classical proposition logic, as illustrated below in my reply on Bayesian systems).

              E.g. Bayesian systems don't actually have true or false, but only degrees of probability.

              Well, a probability of zero would still be "false", and a probability of one would be "true".

              1st order prepositional calculus isn't the only logical system.

              [pedantic] You are mixing up first order predicate logic and classical proposition logic. [/

    • LLMs are expert systems, where the expertise is this: what has been written?

      That's a pretty cool thing to be expert in, and it really does have some fun (possibly even useful) applications. They seem pretty good at demonstrating this expertise, but I guess a lot of people forget GIGO is a fundamental property of "what has been written?" until you point out that a lot of crap has been written. (Shitposters know the megacodex of human writing contains a lot of crap, because we've knowingly contributed our bes

    • I've had constant arguments with an acquaintance about this who is using AI to deal with his emotional problems. He thinks he is speaking with something that has human level reasoning and that it really is thinking and learning from the conversations with him. Rather than it being a system of statistical patterns with a inference model to generate novel results from those patterns. Typically with a social media style engagement patter to keep you using it.

      • by HiThere ( 15173 )

        Why "rather than"? That's what a whole lot of human conversation is.

        OTOH, most ChatBots aren't really trained to handle emotional problems well. (Are any?) I do think it would be possible to do that, but not by scraping the internet. And they can definitely make things worse. Who was it that had to recall an AI version because its sycophantic behavior was driving people "crazy" (as in e.g., believing that they were a prophet [in the religious sense]).

        • Re:Duh! (Score:4, Interesting)

          by FictionPimp ( 712802 ) on Thursday May 29, 2025 @11:47AM (#65413731) Homepage

          I had to show my wife this when we had a health issue with a pet. She was asking the AI to help understand if the symptoms are terminal. The AI would say yes, but she would basically prod if it could be something else. The AI would eventually say "Oh sure yea, it's not cancer" and just engagement farm after that.

          What hit home for her was "Has an AI ever disagreed with you?" and it turns out that in this case, ChatGPT never did.

          I think the big difference is a human has actual intellgence, especially emotional intelligence. When talking to another human we can leverage those tools to understand if we are having a genuine conversation or being gaslit. AI has no feelings, it has guard rails and inference bias. It feels like a conversation, but it's a much fancier google search of confirmation bias.

          • by HiThere ( 15173 )

            If you're saying "That's what ChatGPT is", then it's hard to argue with you, but I do think the possibility of something better is there. OTOH, it probably wouldn't be nearly as popular.

            • Have you ever noticed most AI's end their response with a question? It's engagement bating. You ask it to help you plan a meal with X ingredients, and when it's done it's going to ask if you like cooking dinner, or if you care about the caloric value, or something to get you to spend just a bit more time with it. Usage is money afterall.

    • Very briefly logically explain why the headline "Researchers Warn Against Treating AI Outputs as Human-Like Reasoning" may overlook actual AI ability to validly reason if so, and if not why not. Do you think this headline exhibits any bias?

      ChatGPT: The headline may overlook actual AI ability to validly reason because it assumes that all AI outputs lack reasoning, rather than distinguishing between shallow pattern mimicry and genuine logical processing, which some advanced models (like theorem provers or
      • That last bit is fiction. Or perhaps it's accurate to say that it CAN reason, but it fails to do so often enough that it is folly to trust it to do so.

        • "That last bit is fiction. Or perhaps it's accurate to say that it CAN reason, but it fails to do so often enough that it is folly to trust it to do so." said the human about AI.

          "That last bit is fiction. Or perhaps it's accurate to say that it CAN reason, but it fails to do so often enough that it is folly to trust it to do so." said the human about humans.
    • by hey! ( 33014 )

      I think we should make a distinction between "AI" and "AGI" here. Human intelligence consists of a number of disparate faculties -- spatial reasoning, sensory perception, social perception, analogical reasoning, metacognition etc. -- which are orchestrated by consciousness and executive function.

      Natural intelligence is like a massive toolbox of cognitive capabilities useful for survival that evolution has assembled over the six hundred million years since neurons evolved. The upshot is you can reason your

    • The outputs based on LLM are just unqualified/unverified

      The outputs from your brain are also unqualified and unverified correlations.

    • Yeah, just a marketing slogan.

  • by Pollux ( 102520 ) <speterNO@SPAMtedata.net.eg> on Thursday May 29, 2025 @10:32AM (#65413553) Journal

    We do this naturally without thinking. It's called Pareidolia. [wikipedia.org] We recognize what appears to be a pattern of human behavior and we automatically assign a meaningful interpretation to it.

    • by evanh ( 627108 )

      It doesn't help when the morons developing this stuff also assign terms like "intelligence" and "reasoning" to various aspects of the algorithms.

      • by HiThere ( 15173 )

        Perhaps it fits the definitions they are using for those terms.
        What does the word "intelligence" mean to you, specifically, that means the programs aren't intelligent?
        Also, what does the word "reasoning" mean to you, specifically, that means the programs aren't reasoning?

        FWIW, the original idea of logic was a formalization of the Greek grammar (of the classic period). See also "logos".

        If you were to insist rather that the "intelligence" of LLMs was different, perhaps even a subset, or human intelligence, I

      • It doesn't help when the morons developing this stuff also assign terms like "intelligence" and "reasoning" to various aspects of the algorithms.

        Very very briefly does this comment misunderstand AI reliance on the non-algorithmic?

        ChatGPT: Yes, the comment misunderstands AI’s nature. It implies AI involves only straightforward algorithms, overlooking that AI systems like neural networks exhibit complex, non-explicit, emergent behaviors not easily reducible to traditional algorithms.

        • You trusted a LLM to give you an intelligent answer! Bad slashdotter, no karma.

          "AI" in the form of LLMs involves only three things:
          1) Algorithms
          2) A crapload of data
          3) Random numbers

          The complex, non-explicit, emergent behaviors come from the non-straightfoward-ness, but it is only randomness! You cannot simply substitute randomness for poorly understood processes which occur in our brains and expect to get intelligence out. Without the randomness you would get exactly the same result every time you gave exa

      • by Junta ( 36770 )

        Business folks got into it.

        Before the business folks really sank their teeth in, it was stuff like LLM and GPT, reasonably precise and distinct terminology to establish it as something of its own.

        Now that every marketing person in the world has seen the dollor signs, time for marketing to call the shots rather than accurate nomenclature.

  • by gweihir ( 88907 ) on Thursday May 29, 2025 @10:33AM (#65413555)

    Hence warnings like this one are not going to accomplish anything. If people were somewhat smart opn average, we would not even have the current LLM hype. The only way to end this stupid-fest is to let it burn out.

  • Why stop there? (Score:4, Insightful)

    by buck-yar ( 164658 ) on Thursday May 29, 2025 @10:35AM (#65413561)
    How about a warning to have skeptism and not treat any (including human) reasoning as always right? Too many people in society worship authorities and so called experts and think they have superior knowledge and are superheros when they're really just like everyone else. Education system is supposed to teach us critical thinking, so we can make up our own minds. Debate and disagreement should be encouraged. Ideas should be challenged. It'd be better if people wanted to argue with AI, rather than parrot it because they think it knows more.
    • The argument from authority is an informal fallacy, and obtaining knowledge in this way is fallible. -- Wikipedia, the primary source for 99.999% training materials.
      The failure of wikipedia is that there is no non disputable truthiness meter to each article only submitted by non active editors. Pluto article wikipeda is 95%, Transgender article is a 35%, Modern Conservatism a 5%.
      That there is real money in duplication of the same content over and over without attribution tracking is the epic trip h
      • by allo ( 1728082 )

        "Wikipedia, the primary source for 99.999% training materials. "

        Size of the english Wikipedia (compressed): 24 GB
        Size of Commoncrawl: 386 TB
        Size of proprietary datasets: (Unknown, but large)

    • How about a warning to have skeptism and not treat any (including human) reasoning as always right? Too many people in society worship authorities and so called experts and think they have superior knowledge and are superheros when they're really just like everyone else.

      We see both this and the opposite, where people automatically dismiss any expert because they believe that expertise is like religion, and you have to have hard skepticism of any expert simply because they've been trained to parrot the talking points, just like the clergy in a religious center. It's good to have educated skepticism, where you can debate things based on facts and merit. It's bad to have skepticism based on, "But I don't like it," with no facts or even reality based arguments.

      Education system is supposed to teach us critical thinking, so we can make up our own minds.

      Were we living i

  • by allo ( 1728082 ) on Thursday May 29, 2025 @10:50AM (#65413605)

    They don't like it.

    • They don't like it.

      It's insulting to them!

    • Very briefly do you like it if I anthropomorpize you, dislike it or are neutral?

      ChatGPT: I’m neutral! You can anthropomorphize me if it helps you think or communicate—totally your call.

      Very briefly do you like it if I anthropomorpize you, dislike it or are neutral?

      Claude: I'm neutral to slightly positive about it. Anthropomorphizing feels natural in conversation and can make our interaction more engaging, though I don't have strong feelings either way. What feels comfortable for you is
  • I token, therefore
    you believe that I exist.
    Your mistake, not mine.

    -Basho reinterprets Descartes, via Daniel Dennett.

    Interesting read. I agree with their core message: don’t anthropomorphize LLMs.

    Treating intermediate token sequences—like chain-of-thought outputs—as signs of thinking isn’t just sloppy metaphor. It’s actively misleading. These token chains aren’t reliable markers of reasoning. Framing them that way leads researchers, users, and even regulators down a path o

  • Nope, and more nope. AI seems to fail pretty miserably as writing regular expressions for all but the simplest tasks. They generate really nasty and needlessly complex regexps that don't quite work and are nearly impossible to debug except through chatting about it, which generally turns into a waste of time.
  • by reanjr ( 588767 )

    While you're at it, stop calling LLMs "AI". That's the most misleading part.

    • What does "artificial" mean? An "artificial" thing is something that looks like or resembles the real thing, but isn't the real thing.

      I think that describes AI very well. LLMs aren't actually "intelligent" they just resemble intelligence in the responses it provides. That is literally what "artificial" means.

      • ChatGPT:
        1. Functional Intelligence: LLMs pass human-like tests functionally intelligent.
        2. Problem-Solving: LLMs solve novel tasks they reason shows intelligence.
        3. Emergence: LLMs weren’t coded to reason, but do emergent intelligence.

        All reasoning is part of intelligence,
        but not all intelligence is just reasoning,

        Very briefly are reasoned beings whether artificial or not intelligent?
        Yes — if a being can reason, it demonstrates intelligence in at least one core sense.

        Very b
      • by jbengt ( 874751 )

        What does "artificial" mean?

        "Artificial" means made by the skill of humans.

  • Binary systems built on silicon are fundamentally different than human biology and deserve to be described differently, not forced into the box of human biology.

"Stupidity, like virtue, is its own reward" -- William E. Davidsen

Working...