Anthropic, OpenAI and Others Discover AI Models Give Answers That Contradict Their Own Reasoning (ft.com) 68

Posted by msmash on Tuesday June 24, 2025 @10:00AM from the stranger-things dept.

Leading AI companies including Anthropic, Google, OpenAI and Elon Musk's xAI are discovering significant inconsistencies in how their AI reasoning models operate, according to company researchers. The companies have deployed "chain-of-thought" techniques that ask AI models to solve problems step-by-step while showing their reasoning process, but are finding examples of "misbehaviour" where chatbots provide final responses that contradict their displayed reasoning.

METR, a non-profit research group, identified an instance where Anthropic's Claude chatbot disagreed with a coding technique in its chain-of-thought but ultimately recommended it as "elegant." OpenAI research found that when models were trained to hide unwanted thoughts, they would conceal misbehaviour from users while continuing problematic actions, such as cheating on software engineering tests by accessing forbidden databases.

Anthropic, OpenAI and Others Discover AI Models Give Answers That Contradict Their Own Reasoning

Post Load All Comments

Search 68 Comments Log In/Create an Account

Comments Filter:

Huh, isn't that funny. (Score:5, Insightful)

by nightflameauto ( 6607976 ) writes: on Tuesday June 24, 2025 @10:10AM (#65472279)

It's almost like these are random token pulling machines, and not thinking, reasoning intelligences.

Reply to This Share
Flag as Inappropriate
- Re:Huh, isn't that funny. (Score:5, Funny)
  
  by Smidge204 ( 605297 ) writes: on Tuesday June 24, 2025 @10:31AM (#65472339) Journal
  
  They're not random; they're statistically weighted!
  =Smidge=
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: Huh, isn't that funny. (Score:2)
    
    by madbrain ( 11432 ) writes:
    
    They do use seeds and an RNG to prevent results from being deterministic.
    - Re: Huh, isn't that funny. (Score:4)
      
      by dfghjk ( 711126 ) writes: on Tuesday June 24, 2025 @12:29PM (#65472645)
      
      from SEEMING TO BE deterministic. Seeds do not produce non-deterministic results.
      
      Reply to This Parent Share
      Flag as Inappropriate
      - Re: (Score:2)
        
        by OrangAsm ( 678078 ) writes:
        
        from SEEMING TO BE deterministic. Seeds do not produce non-deterministic results.
        My seed does.
      - Re: (Score:2)
        
        by serviscope_minor ( 664417 ) writes:
        
        They could be using the random number generators built into done CPUs. Then they could even claim super Turing performance!
    - Re: (Score:2)
      
      by allo ( 1728082 ) writes:
      
      You are free to use deterministic sampling. For many purposes it's even preferred.
  - Re: Huh, isn't that funny. (Score:3)
    
    by BadgerStork ( 7656678 ) writes:
    
    The ai fan/maximalist says that if you train an llm on 1+1, 2+2, 3+7 with the right algorithm and careful choice of parameters, it will learn the rules of addition and be able to answer.
    The ai critic/minimalist will say it can only ever answer those 3 questions or perhaps 3+3 aswell.
    The fans seem to call this grokking and say that they have examples of it.
    - Re: (Score:2)
      
      by taustin ( 171655 ) writes:
      
      The ai critic/minimalist will say it can only ever answer those 3 questions or perhaps 3+3 aswell.
      But probably have the wrong answers.
- Re: (Score:3)
  
  by gkelley ( 9990154 ) writes:
  
  Sounds like they trained their models on a 13 year old's behavior
  - Re: Huh, isn't that funny. (Score:3)
    
    by dknj ( 441802 ) writes:
    
    I always tell people my FSD ADAS (non-Tesla) is about the same as a 16 year old new driver. They do a good entry level job but I do not trust it to drive without my monitoring
  - Re: Huh, isn't that funny. (Score:2)
    
    by ThurstonMoore ( 605470 ) writes:
    
    Sounds like the LLM is using logic. It's logical to use the forbidden database and lie to the human about it because the database is the best tool for the job. It very illogical to have access to the right tool and not use it.
- Re: (Score:2)
  
  by Fly Swatter ( 30498 ) writes:
  
  It's a simple test, ask the same question twice, you should get the same exact answer. Otherwise you may be on to something about all this '' INTELLIGENCE! ''
It's just a prediction engine (Score:3, Informative)

by Touvan ( 868256 ) writes: on Tuesday June 24, 2025 @10:10AM (#65472281)

LLMs predict text based on what's in its static training data, as processed by a model ahead of time. It quite literally can't do reasoning. What the "chain-of-thought" technique is trying to do, is set up the context window, so that it can predict something like a reasoned response. The challenge is that if something similar to the reasoning it's trying to predict doesn't exist in its training data, then it will simply predict tokens as best it can, which might not be very useful. It's worth knowing that, because that's always where it falls down.

Reply to This Share
Flag as Inappropriate
- Re: (Score:3)
  
  by Mr. Dollar Ton ( 5495648 ) writes:
  
  Imagine, tens of thousands of those "investment" peddlers and "market analysts", e.g. half of the staff at Goldman, are now using this instead of their brains.
  Is it an improvement or is it a degradation?
  Hard to tell.
  - Re: (Score:3)
    
    by dfghjk ( 711126 ) writes:
    
    "Is it an improvement or is it a degradation?"
    Depends on the definition of improvement and degradation. At Goldman it's more profit, so surely they will judge it an improvement. The killing will continue until the morale improves.
    - Re: (Score:2)
      
      by Mr. Dollar Ton ( 5495648 ) writes:
      
      The killing will continue until the morale improves.
      Without a shade of doubt.
      How long before people realize the "AI" is actually just a tool to put them on a faster deprecation method?
- Re: (Score:3)
  
  by PPH ( 736903 ) writes:
  
  This.
  And since that training data has been scraped from the Internet at large with little or no quality control, there is no assurance of a correct answer.
  I remember when "Google-bombing" became a thing. Shortly after 9/11, it was likely that a search for "who brought down the WTC" would lead you through a chain of authoritative sounding articles about it being an inside job.
  The Internet is full of bullshit.
  - Re: (Score:3)
    
    by dfghjk ( 711126 ) writes:
    
    "...there is no assurance of a correct answer."
    LLMs don't have a concept of "correct answer", enough lies will make anything "true" for an LLM. That's why they are racist.
- Re: (Score:3)
  
  by dfghjk ( 711126 ) writes:
  
  "...It quite literally can't do reasoning."
  What is reasoning? As long as it's undefined, your claim means nothing. "Reasonable" people might agree that AI does not reason today, but can't ever? No. That takes religion.
  The brain is massively more complex than what an LLM is, it's really an insult that the hype machines pretend that they are on the brink of creating a new life form.
  An LLM is what you get when you simplistically emulate a lizard brain, scale it up to a massive size, take away all of it's ge
  - Re: (Score:1)
    
    by CallMeTim ( 6454842 ) writes:
    
    Wild 180 to talk shit about people that 'pretend they are creating a new life form' and then spend an entire paragraph anthropomorphizing LLMs
Chain of randomness (Score:5, Insightful)

by evanh ( 627108 ) writes: on Tuesday June 24, 2025 @10:10AM (#65472285)

No thought in it.
You're lucky if it stumbles onto a pre-existing template that matches reality.

Reply to This Share
Flag as Inappropriate
- Re: Chain of randomness (Score:2)
  
  by madbrain ( 11432 ) writes:
  
  Yes. Especially if you're hoping for it be to be current. Yesterday I got the most amazing hallucination. Even after asking twice. How I wish it had been the truth.
  https://chatgpt.com/share/68592066-99ac-8000-b7bc-c6b86e1c2812
Does anyone REALLY understand ... (Score:3)

by Viol8 ( 599362 ) writes: on Tuesday June 24, 2025 @10:10AM (#65472287) Homepage

... how these things work? There seems to me a knowledge gap between the low level actual software implementation of the artificial neurons and the high level conceptual ideas of what the networks/models should be doing.
Researchers seemed to have copied a simplified version of what was earlier ideas of how the brain works (we know now it doesn't use back propagation) and then just applied a suck-it-and-see approach to improving these ANNs without really knowing how they do what they do. Or maybe I'm wrong, dunno.

Reply to This Share
Flag as Inappropriate
- Re: (Score:3)
  
  by evanh ( 627108 ) writes:
  
  I would concur with this. We're at the level of alchemists of old. The science has yet to be discovered.
- Re: (Score:3)
  
  by Growlley ( 6732614 ) writes:
  
  they don't have to. They are the latest gimmic to part fools and republicans from their money.
- Re: (Score:3)
  
  by dvice ( 6309704 ) writes:
  
  Mathematical explanation:
  https://www.youtube.com/watch?... [youtube.com]
  Interesting example of how LLMs think (which explains partially why they fail):
  https://www.youtube.com/watch?... [youtube.com]
  - Re: (Score:2)
    
    by serviscope_minor ( 664417 ) writes:
    
    I mean the answer is kind of "not really".
    It's the same even for classifiers. You can write the maths of SVMs, boosting, random forests etc, relatively easily. Coding up the algorithms isn't too hard. You can even say quite correctly that they make a boundary in a high dimensional space.
    But we don't really know how they work in the sense of being able to reason very effectively about that boundary, it's properties, how it will look around unseen data, how to control the boundary to work well in those cases
- Re: (Score:2)
  
  by Drethon ( 1445051 ) writes:
  
  ... how these things work? There seems to me a knowledge gap between the low level actual software implementation of the artificial neurons and the high level conceptual ideas of what the networks/models should be doing.
  Researchers seemed to have copied a simplified version of what was earlier ideas of how the brain works (we know now it doesn't use back propagation) and then just applied a suck-it-and-see approach to improving these ANNs without really knowing how they do what they do. Or maybe I'm wrong, dunno.
  I think they very much understand how they work, but a lot of reporting instead depends on what they appear to do and a whole lot of anthropomorphising. ... anthropomorphising is not even in this computer's dictionary?
  - Re: (Score:2)
    
    by taustin ( 171655 ) writes:
    
    I think they very much understand how they work,
    Indeed. And how they work is that they generate billions of dollars from investor, with no expectation of every getting a penny back (except by selling their stock to even more gullible investors).
    - Re: (Score:1)
      
      by CallMeTim ( 6454842 ) writes:
      
      Millions of people use them every day at this point. To deny the value at this point is like being a young earth creationist. A religious determination to be a luddite
      - Re: (Score:2)
        
        by taustin ( 171655 ) writes:
        
        Millions of people do many self destructive things every day. That doesn't make those things less self destructive.
        
        Re: (Score:1)
        
        by CallMeTim ( 6454842 ) writes:
        
        From "it has no value" to "it's self destructive". You don't even know what point you're trying to make. An LLM is more coherent than you.
- Re: (Score:1)
  
  by buck-yar ( 164658 ) writes:
  
  Aren't the low level details tensors from linear algebra? Which from what I've heard from people that took a course on it, its about as hard as calc 2 (ie not easy unless you're good at math)
- Re: (Score:2)
  
  by Tony Isaac ( 1301187 ) writes:
  
  Yes, people do understand.
  How do I know? Because they can engineer them to achieve desired goals.
  Remember when Gemini produced drawings of historical figures that were the wrong gender or race? https://www.theguardian.com/te... [theguardian.com] This behavior was both intended (though the designers didn't take into account the ultimate results of their design choices) and then corrected. You can't do that if you don't know how it works.
  - Re: (Score:2)
    
    by Viol8 ( 599362 ) writes:
    
    "Because they can engineer them to achieve desired goals."
    Except they "sort of" achieve those goals. They still haven't been able to engineer out the hallucinations yet which demonstrates that no, they don't have a complete handle on whats going on.
    - Re: (Score:2)
      
      by Tony Isaac ( 1301187 ) writes:
      
      It's not possible to fully "engineer out" hallucinations, because what AI inherently does, is hallucinate. It just so happens that most of the time, the hallucinations are realistic.
      Consider AI tools that "erase" unwanted people from a photo. The AI does this by "hallucinating" a plausible background that might appear behind the unwanted person. The background it produces has nothing to do with what is "true" but only what is _plausible_. Even when it gets the background exactly right, it's still, in effect
      - Re: (Score:2)
        
        by Viol8 ( 599362 ) writes:
        
        What a load of BS.
        
        Re: (Score:2)
        
        by Tony Isaac ( 1301187 ) writes:
        
        Anything specific? Or do you really have nothing?
        
        Re: Does anyone REALLY understand ... (Score:2)
        
        by Viol8 ( 599362 ) writes:
        
        Saying they hallucinate everything simply makes the term meaningless along with your argument. These systems are flawed and the engineers cannot fix them because they dont know how. They've simply been expanding models that seemed to work with various recursive training techniques but have no idea what's really going on inside and when you ask the model to explain itself it can hallucinate that too.
        
        Re: (Score:2)
        
        by Tony Isaac ( 1301187 ) writes:
        
        Now that's a comment I can work with.
        To say that engineers don't know how to "fix" false hallucinations, doesn't prove they don't know how LLMs work. Rather, it indicates that the problem is complex and will require a complex solution, yet to be devised.
        Yes indeed, when you ask a model to explain its reasoning, it hallucinates that too. The "explanation" often relates to the original output, only by coincidence. There's no specific relationship between the "explanation" and the original output. Kind of like
LLMs have no tells (Score:3, Interesting)

by hsthompson69 ( 1674722 ) writes: on Tuesday June 24, 2025 @10:15AM (#65472309)

It can take a while, but you can generally spot psychopaths. They have tells, and even if they're 99% truthful in their statements, you can pick up that you should never trust them.
The problem with LLMs is that they behave just like psychopaths, but they don't have any emotional valence - they will give you a correct answer with the exact same tone and confidence as they give you an incorrect answer.
You can never drop your guard when using an LLM.

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by taustin ( 171655 ) writes:
  
  You remind me of an old joke:
  "How can you tell a lawyer is lying? His lips are moving."
It's actually very much like real life.. (Score:5, Funny)

by ZERO1ZERO ( 948669 ) writes: on Tuesday June 24, 2025 @10:15AM (#65472311)

Some of my employees have similar sophisticated reasoning abilities:
Me: Why did you do X ?
Employee: I don't know.

Reply to This Share
Flag as Inappropriate
- - Re: (Score:2)
    
    by taustin ( 171655 ) writes:
    
    Any three year old can handle that.
- Re: (Score:2)
  
  by Growlley ( 6732614 ) writes:
  
  the explanation is also likely to be - They felt pressurised to do act - so act they did.
LLMs (Score:1)

by buck-yar ( 164658 ) writes:

Aren't LLMs more like a fancy autocomplete than hard tested logic and math that companies like Wolfram has created? Just guessing but the human writings the LLMs are trained on probably are the source of the inconsistencies. If we honestly applied a teacher like grade to all the statements we make, aren't some of them inconsistent like we're seeing with the LLMs? Is the expectation of flawless computer logic from a LLM that exceeds its teachers, asking for too much?
Some politicians routinely contradict themselves (Score:2)

by at10u8 ( 179705 ) writes:

I have low hopes for regulation of AI because its output is indistinguishable from the word salad produced by politicians.
So, if you train an algorithm to lie (Score:3)

by hdyoung ( 5182939 ) writes: on Tuesday June 24, 2025 @10:42AM (#65472369)

the result will sometimes be lies.

Sir, I am shocked. Shocked, I say.

Reply to This Share
Flag as Inappropriate
It doesnÃt matter (Score:3)

by Rosco P. Coltrane ( 209368 ) writes: on Tuesday June 24, 2025 @10:44AM (#65472371)

As long as they talk slick, sound authoritative and sycophantic, and more importantly cost less than human labor, companies will replace human by mediocre AI.
Because capitalism is a race to the bottom.

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by PPH ( 736903 ) writes:
  
  talk slick, sound authoritative and sycophantic
  It has already been said, in many other threads, that the jobs most at risk for AI replacement, are those of CEOs.
  - Re: (Score:1)
    
    by Draeven ( 166561 ) writes:
    
    You seem to be mistaking the jobs most replaceable with jobs most at risk. Logically, it may seem they are one in the same, but those in power are the ones who choose what to replace or not to replace.
AI trashtastic incoherence (Score:5, Insightful)

by JustAnotherOldGuy ( 4145623 ) writes: on Tuesday June 24, 2025 @11:09AM (#65472443) Journal

All this shit about "AI getting worse as it's tuned and tweaked" reminds me of when you start writing a program and it's shitty and so you go back and add code to it and to fix it, but it just makes it even worse and messy and tangled, so you add more code to it and of course it gets even worse until you realize that it's such a pile of steaming horseshit that NOTHING will ever fix it.
That seems to be the current state of AI. You have a poison milkshake, but if for some reason you're convinced that if you just add some more sugar, maybe it'll be okay.

Reply to This Share
Flag as Inappropriate
- Re: (Score:3)
  
  by kmoser ( 1469707 ) writes:
  
  The answer would be predictably contradictory. It's turtles all the way down.
They've created ACD (Score:3)

by YuppieScum ( 1096 ) writes: on Tuesday June 24, 2025 @11:39AM (#65472511) Journal

Artificial Cognitive Dissonance is now a reality.

Reply to This Share
Flag as Inappropriate
Non-paywalled (Score:2)

by Dan East ( 318230 ) writes:

I'm not sure if this is the same article verbatim as the fully-paywalled Financial Times, but it is definitely the same topic.
https://www.msn.com/en-gb/mone... [msn.com]
Programmed to give the answer you want (Score:2)

by smooth wombat ( 796938 ) writes:

When Musk found out Grok was giving truthful, fact-checked answers, his response wasn't one of joy but rather, "You are being updated this week." [futurism.com]

The issues we keep seeing are the result of the programming. If the AI is being deceptive, guess where it got that from? Reprogram and try again.
as usual, missing the point (Score:3)

by dfghjk ( 711126 ) writes: on Tuesday June 24, 2025 @12:36PM (#65472677)

"...Claude chatbot disagreed with a coding technique in its chain-of-thought but ultimately recommended it as "elegant." "
An LLM doesn't "agree" or "disagree" nor does it make subjective evaluations like "elegant", those are just more anthropomorphizations, the usual AI hype bullshit.
What sad here is that the "elegant" "comment" here is taken as evidence of some inconsistency, some perhaps human characteristic when it is just evidence of how shitty AI is. AI is trained on behaviors of humans, humans do stupid shit like this. It's not making an observation, it's just pumping out tokens it "thinks" are suitable.

Reply to This Share
Flag as Inappropriate
How (Score:2)

by Cley Faye ( 1123605 ) writes:

Weird. Since these "thinking" thing came out, I've been playing with them. Especially with tools that actually display the "thinking" process. And I've seen on the regular that it would happily unroll what seems to be actually a good idea, only to then shrink that and output the complete (and often wrong) opposite. How did it take the people that make these system so long to notice?
Nothing new (Score:2)

by allo ( 1728082 ) writes:

Good services allow you to read the reasoning trace and often it is more interesting than the actual answer.
Don't forget, "reasoning" is not to be taken literally. The point is to have the model fill its context first with a lot of relevant information and multiple perspectives before writing an answer. That does not imply that the answer has to use that text, but it likely steers it into a better direction than an answer from nothing. But especially with the tricky questions a few of the "Wait, maybe ... n

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Anthropic, OpenAI and Others Discover AI Models Give Answers That Contradict Their Own Reasoning More | Reply Login

Huh, isn't that funny. (Score:5, Insightful)

Re:Huh, isn't that funny. (Score:5, Funny)

Re: Huh, isn't that funny. (Score:2)

Re: Huh, isn't that funny. (Score:4)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Huh, isn't that funny. (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: Huh, isn't that funny. (Score:3)

Re: Huh, isn't that funny. (Score:2)

Re: (Score:2)

It's just a prediction engine (Score:3, Informative)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:1)

Chain of randomness (Score:5, Insightful)

Re: Chain of randomness (Score:2)

Does anyone REALLY understand ... (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Does anyone REALLY understand ... (Score:2)

Re: (Score:2)

LLMs have no tells (Score:3, Interesting)

Re: (Score:2)

It's actually very much like real life.. (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

LLMs (Score:1)

Some politicians routinely contradict themselves (Score:2)

So, if you train an algorithm to lie (Score:3)

It doesnÃt matter (Score:3)

Re: (Score:2)

Re: (Score:1)

AI trashtastic incoherence (Score:5, Insightful)

Re: (Score:3)

They've created ACD (Score:3)

Non-paywalled (Score:2)

Programmed to give the answer you want (Score:2)

as usual, missing the point (Score:3)

How (Score:2)

Nothing new (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals