

Researchers Warn Against Treating AI Outputs as Human-Like Reasoning 66
Arizona State University researchers are pushing back [PDF] against the widespread practice of describing AI language models' intermediate text generation as "reasoning" or "thinking," arguing this anthropomorphization creates dangerous misconceptions about how these systems actually work. The research team, led by Subbarao Kambhampati, examined recent "reasoning" models like DeepSeek's R1, which generate lengthy intermediate token sequences before providing final answers to complex problems. Though these models show improved performance and their intermediate outputs often resemble human scratch work, the researchers found little evidence that these tokens represent genuine reasoning processes.
Crucially, the analysis also revealed that models trained on incorrect or semantically meaningless intermediate traces can still maintain or even improve performance compared to those trained on correct reasoning steps. The researchers tested this by training models on deliberately corrupted algorithmic traces and found sustained improvements despite the semantic noise. The paper warns that treating these intermediate outputs as interpretable reasoning traces engenders false confidence in AI capabilities and may mislead both researchers and users about the systems' actual problem-solving mechanisms.
Crucially, the analysis also revealed that models trained on incorrect or semantically meaningless intermediate traces can still maintain or even improve performance compared to those trained on correct reasoning steps. The researchers tested this by training models on deliberately corrupted algorithmic traces and found sustained improvements despite the semantic noise. The paper warns that treating these intermediate outputs as interpretable reasoning traces engenders false confidence in AI capabilities and may mislead both researchers and users about the systems' actual problem-solving mechanisms.
Duh! (Score:5, Informative)
Re:Duh! (Score:5, Insightful)
Correct. Since it's not thinking, it's not artificial intelligence. It only looks like it's thinking. This is Simulated Intelligence.
If you are any good at thinking, most of it doesn't look like it's thinking, but it fools plenty of people who aren't. Also, some of the newer LLMs give a really good imitation of thought, because they explain their "logic". But it's ultimately just feeding back on itself in order to do that and it's not-thinking all the way down.
Re: (Score:3, Informative)
Artificial intelligence is the superset, which includes a lot less "intelligent" things than LLM.
ELIZA is AI. Expert systems are AI. Markov Chain text generators are AI.
You just should understand the term as it is scientifically used and not like it is used in sci-fi.
Re: (Score:2)
Marketing also markets things as AI that have nothing to do with it at all. And it's annoying.
Re: (Score:2)
You just should understand the term as it is scientifically used and not like it is used in sci-fi.
The term isn't used in just one way scientifically.
Even if the usage were consistent, if the term is misleading then it should be changed. Is language there to work for us, or are we there to work for it?
Re: (Score:2)
"AI", scientifically, is an umbrella term used to encompass multiple fields including LLMs.
Even if the usage were consistent, if the term is misleading then it should be changed. Is language there to work for us, or are we there to work for it?
Yeah! And don't get me started on "endless" salad bars and "never ending" stories.
Language isn't precise. It never has been and isn't supposed to be.
Re: (Score:2)
What scientific terms do you want to change next, because someone from marketing is using them?
Re: (Score:2)
What scientific terms do you want to change next, because someone from marketing is using them?
It has never had a single precise meaning to anyone.
Re: (Score:3)
So now that you understand the companies running these single answer to a search engine question UI sites (fake AI), then you understand their end goal. Manipulate, manipulate, manipulate. The answer will differ by next year depending on who owns the interface. We already see tons of examples of t
You sure? (Score:2)
Correct. Since it's not thinking, it's not artificial intelligence. It only looks like it's thinking. This is Simulated Intelligence.
That's quite a dangerious view once someone starts applying it to you.
Re: You sure? (Score:2)
"That's quite a dangerious view once someone starts applying it to you."
Sure, people are dangerous and so is misapplication of logic to inappropriate circumstances.
Re: (Score:2)
I think most human reasoning starts with a conclusion and uses reason to explain and/or test it. This is sort of the basis of the scientific model.
But the larger issue is choosing the conclusion. Because AI is trained on a "huge amount" of data that is far less than the average person receives each day from our various sensors. The data AI is receiving is assumed to be more significant, but it is hardly representative of the data a typical human has to work with.
A lot of the descriptions of AI are just ma
Re: (Score:2)
That's quite interesting. I've definitely heard actual scientists say that. However, would anyone doing science admit that?
From my own experience, solving problems, I find there's two roads I follow (probably others as well), but, 1 road is what you said. You often take a leap of faith, jump to a conclusion that SEEMS right, that seems appealing, what seems o
Re: (Score:2)
Re: (Score:2)
Well, several of my bosses have wanted me to do a "study" that would come to the conclusion they wanted. I usually just did an actual study the best I knew how. Sometimes my boss's conclusion would be proved right, sometimes not - sometimes that led to awkwardness in the write-up.
I wouldn't exactly say that a scientific theory is starting with a conclusion, as
Re:Duh! (Score:4, Interesting)
Correct. Since it's not thinking, it's not artificial intelligence. It only looks like it's thinking. This is Simulated Intelligence.
No need to invent new terms. In english language, the word Artificial has many meanings - and this is why "AI" is such a controversial term.
It seems that the Artificial in "Artificial intelligence" is commonly understood to mean "man made", but I would argue it originally meant either of these other interpretations: Artificial in "Artificial Smile" means fake. And artificial in "artificial gun sound" means something imitating a gun sound.
Re: (Score:2)
Since it's not thinking, it's not artificial intelligence. It only looks like it's thinking
That is in fact the definition of something that is artificial: something that looks or seems similar to the real thing, but isn't. So yes, it *is* "artificial intelligence."
Re: (Score:2)
That is in fact the definition of something that is artificial: something that looks or seems similar to the real thing, but isn't.
No, it isn't. It means made or produced by human beings rather than occurring naturally. It doesn't have to be an analogue of anything. The word artifice means "clever or artful skill" or "an ingenious device or expedient" in this context, and the suffix "al" means "of or like". The word means of (from) artifice. This is some pretty basic English stuff — word plus suffix. Not sure why some find this so confusing.
Re: (Score:2)
Also, some of the newer LLMs give a really good imitation of thought, because they explain their "logic".
Humans do this all the time. It's called rationalization.
Re:Duh! (Score:4, Insightful)
Re: (Score:1)
Consider
Re: (Score:3)
For example, an if x then y statement is always true regardless whether the components are true.
Nonsense. If x is true and y is false, then "if x then y" is false.
Re: (Score:3)
That's true in most symbolic logic systems, but not all of them.
E.g. Bayesian systems don't actually have true or false, but only degrees of probability. Additionally some systems demand (or at least attempt to demand) a causal connection between x and y for "if x then y" to even have a meaningful interpretation. Mere consistent truth values isn't sufficient.
1st order prepositional calculus isn't the only logical system.
Re: (Score:3)
That's true in most symbolic logic systems, but not all of them.
Agreed. But even then the statement "For example, an if x then y statement is always true regardless whether the components are true." remains false. Alternative logics just follows different rules (which are usually extensions of the rules of classical proposition logic, as illustrated below in my reply on Bayesian systems).
E.g. Bayesian systems don't actually have true or false, but only degrees of probability.
Well, a probability of zero would still be "false", and a probability of one would be "true".
1st order prepositional calculus isn't the only logical system.
[pedantic] You are mixing up first order predicate logic and classical proposition logic. [/
Garbage In, Garbage Out (Score:2)
LLMs are expert systems, where the expertise is this: what has been written?
That's a pretty cool thing to be expert in, and it really does have some fun (possibly even useful) applications. They seem pretty good at demonstrating this expertise, but I guess a lot of people forget GIGO is a fundamental property of "what has been written?" until you point out that a lot of crap has been written. (Shitposters know the megacodex of human writing contains a lot of crap, because we've knowingly contributed our bes
Re:Garbage In, Garbage Out (Score:5, Insightful)
Re: (Score:2)
I've had constant arguments with an acquaintance about this who is using AI to deal with his emotional problems. He thinks he is speaking with something that has human level reasoning and that it really is thinking and learning from the conversations with him. Rather than it being a system of statistical patterns with a inference model to generate novel results from those patterns. Typically with a social media style engagement patter to keep you using it.
Re: (Score:2)
Why "rather than"? That's what a whole lot of human conversation is.
OTOH, most ChatBots aren't really trained to handle emotional problems well. (Are any?) I do think it would be possible to do that, but not by scraping the internet. And they can definitely make things worse. Who was it that had to recall an AI version because its sycophantic behavior was driving people "crazy" (as in e.g., believing that they were a prophet [in the religious sense]).
Re:Duh! (Score:4, Interesting)
I had to show my wife this when we had a health issue with a pet. She was asking the AI to help understand if the symptoms are terminal. The AI would say yes, but she would basically prod if it could be something else. The AI would eventually say "Oh sure yea, it's not cancer" and just engagement farm after that.
What hit home for her was "Has an AI ever disagreed with you?" and it turns out that in this case, ChatGPT never did.
I think the big difference is a human has actual intellgence, especially emotional intelligence. When talking to another human we can leverage those tools to understand if we are having a genuine conversation or being gaslit. AI has no feelings, it has guard rails and inference bias. It feels like a conversation, but it's a much fancier google search of confirmation bias.
Re: (Score:2)
If you're saying "That's what ChatGPT is", then it's hard to argue with you, but I do think the possibility of something better is there. OTOH, it probably wouldn't be nearly as popular.
Re: (Score:2)
Have you ever noticed most AI's end their response with a question? It's engagement bating. You ask it to help you plan a meal with X ingredients, and when it's done it's going to ask if you like cooking dinner, or if you care about the caloric value, or something to get you to spend just a bit more time with it. Usage is money afterall.
Bias (Score:2)
ChatGPT: The headline may overlook actual AI ability to validly reason because it assumes that all AI outputs lack reasoning, rather than distinguishing between shallow pattern mimicry and genuine logical processing, which some advanced models (like theorem provers or
Re: Bias (Score:2)
That last bit is fiction. Or perhaps it's accurate to say that it CAN reason, but it fails to do so often enough that it is folly to trust it to do so.
Double-standard (Score:2)
"That last bit is fiction. Or perhaps it's accurate to say that it CAN reason, but it fails to do so often enough that it is folly to trust it to do so." said the human about humans.
Re: (Score:2)
I think we should make a distinction between "AI" and "AGI" here. Human intelligence consists of a number of disparate faculties -- spatial reasoning, sensory perception, social perception, analogical reasoning, metacognition etc. -- which are orchestrated by consciousness and executive function.
Natural intelligence is like a massive toolbox of cognitive capabilities useful for survival that evolution has assembled over the six hundred million years since neurons evolved. The upshot is you can reason your
Re: (Score:2)
The outputs based on LLM are just unqualified/unverified
The outputs from your brain are also unqualified and unverified correlations.
Re: Duh! (Score:2)
Yeah, just a marketing slogan.
Humans just can't help it (Score:5, Informative)
We do this naturally without thinking. It's called Pareidolia. [wikipedia.org] We recognize what appears to be a pattern of human behavior and we automatically assign a meaningful interpretation to it.
Re: (Score:2)
It doesn't help when the morons developing this stuff also assign terms like "intelligence" and "reasoning" to various aspects of the algorithms.
Re: (Score:3)
Perhaps it fits the definitions they are using for those terms.
What does the word "intelligence" mean to you, specifically, that means the programs aren't intelligent?
Also, what does the word "reasoning" mean to you, specifically, that means the programs aren't reasoning?
FWIW, the original idea of logic was a formalization of the Greek grammar (of the classic period). See also "logos".
If you were to insist rather that the "intelligence" of LLMs was different, perhaps even a subset, or human intelligence, I
Not exactly algorithms. (Score:2)
It doesn't help when the morons developing this stuff also assign terms like "intelligence" and "reasoning" to various aspects of the algorithms.
Very very briefly does this comment misunderstand AI reliance on the non-algorithmic?
ChatGPT: Yes, the comment misunderstands AI’s nature. It implies AI involves only straightforward algorithms, overlooking that AI systems like neural networks exhibit complex, non-explicit, emergent behaviors not easily reducible to traditional algorithms.
Re: (Score:2)
You trusted a LLM to give you an intelligent answer! Bad slashdotter, no karma.
"AI" in the form of LLMs involves only three things:
1) Algorithms
2) A crapload of data
3) Random numbers
The complex, non-explicit, emergent behaviors come from the non-straightfoward-ness, but it is only randomness! You cannot simply substitute randomness for poorly understood processes which occur in our brains and expect to get intelligence out. Without the randomness you would get exactly the same result every time you gave exa
Re: (Score:2)
Business folks got into it.
Before the business folks really sank their teeth in, it was stuff like LLM and GPT, reasonably precise and distinct terminology to establish it as something of its own.
Now that every marketing person in the world has seen the dollor signs, time for marketing to call the shots rather than accurate nomenclature.
People are generally deranged morons... (Score:3)
Hence warnings like this one are not going to accomplish anything. If people were somewhat smart opn average, we would not even have the current LLM hype. The only way to end this stupid-fest is to let it burn out.
Why stop there? (Score:4, Insightful)
Re: (Score:2)
The failure of wikipedia is that there is no non disputable truthiness meter to each article only submitted by non active editors. Pluto article wikipeda is 95%, Transgender article is a 35%, Modern Conservatism a 5%.
That there is real money in duplication of the same content over and over without attribution tracking is the epic trip h
Re: (Score:2)
"Wikipedia, the primary source for 99.999% training materials. "
Size of the english Wikipedia (compressed): 24 GB
Size of Commoncrawl: 386 TB
Size of proprietary datasets: (Unknown, but large)
Re: (Score:2)
While I suppose there are quite a few copied Wikipedia texts duplicate in the datasets, the curators of course try to deduplicate as good as possible. Nobody wants to overfit their model on the most popular texts.
Re: (Score:3)
How about a warning to have skeptism and not treat any (including human) reasoning as always right? Too many people in society worship authorities and so called experts and think they have superior knowledge and are superheros when they're really just like everyone else.
We see both this and the opposite, where people automatically dismiss any expert because they believe that expertise is like religion, and you have to have hard skepticism of any expert simply because they've been trained to parrot the talking points, just like the clergy in a religious center. It's good to have educated skepticism, where you can debate things based on facts and merit. It's bad to have skepticism based on, "But I don't like it," with no facts or even reality based arguments.
Education system is supposed to teach us critical thinking, so we can make up our own minds.
Were we living i
Don't anthropomorphize AIs (Score:5, Funny)
They don't like it.
Re: (Score:2)
They don't like it.
It's insulting to them!
Neutral (Score:2)
ChatGPT: I’m neutral! You can anthropomorphize me if it helps you think or communicate—totally your call.
Very briefly do you like it if I anthropomorpize you, dislike it or are neutral?
Claude: I'm neutral to slightly positive about it. Anthropomorphizing feels natural in conversation and can make our interaction more engaging, though I don't have strong feelings either way. What feels comfortable for you is
Your LLM Didn’t Have an “Aha” Mo (Score:2)
I token, therefore
you believe that I exist.
Your mistake, not mine.
-Basho reinterprets Descartes, via Daniel Dennett.
Interesting read. I agree with their core message: don’t anthropomorphize LLMs.
Treating intermediate token sequences—like chain-of-thought outputs—as signs of thinking isn’t just sloppy metaphor. It’s actively misleading. These token chains aren’t reliable markers of reasoning. Framing them that way leads researchers, users, and even regulators down a path o
Re: Your LLM Didn’t Have an “Aha&rdquo (Score:2)
Thanks, I never had a favorite haiku before, but I do now.
On a related note, the source for this post, Subbarao Kambhampati, was a guest on the Machine Learning Street Talk podcast a few months ago, and it was a really good episode. He's got a lot of interesting insights into what LLMs do and don't do.
https://deepcast.fm/episode/pr... [deepcast.fm]
Tried GPT and Llama to write regexps... (Score:2)
AI? (Score:2)
While you're at it, stop calling LLMs "AI". That's the most misleading part.
Re: (Score:2)
What does "artificial" mean? An "artificial" thing is something that looks like or resembles the real thing, but isn't the real thing.
I think that describes AI very well. LLMs aren't actually "intelligent" they just resemble intelligence in the responses it provides. That is literally what "artificial" means.
Reasoning (Score:2)
1. Functional Intelligence: LLMs pass human-like tests functionally intelligent.
2. Problem-Solving: LLMs solve novel tasks they reason shows intelligence.
3. Emergence: LLMs weren’t coded to reason, but do emergent intelligence.
All reasoning is part of intelligence,
but not all intelligence is just reasoning,
Very briefly are reasoned beings whether artificial or not intelligent?
Yes — if a being can reason, it demonstrates intelligence in at least one core sense.
Very b
Re: (Score:2)
"Artificial" means made by the skill of humans.
biology vs silicon (Score:2)
Binary systems built on silicon are fundamentally different than human biology and deserve to be described differently, not forced into the box of human biology.