
Anthropic, OpenAI and Others Discover AI Models Give Answers That Contradict Their Own Reasoning (ft.com) 68
Leading AI companies including Anthropic, Google, OpenAI and Elon Musk's xAI are discovering significant inconsistencies in how their AI reasoning models operate, according to company researchers. The companies have deployed "chain-of-thought" techniques that ask AI models to solve problems step-by-step while showing their reasoning process, but are finding examples of "misbehaviour" where chatbots provide final responses that contradict their displayed reasoning.
METR, a non-profit research group, identified an instance where Anthropic's Claude chatbot disagreed with a coding technique in its chain-of-thought but ultimately recommended it as "elegant." OpenAI research found that when models were trained to hide unwanted thoughts, they would conceal misbehaviour from users while continuing problematic actions, such as cheating on software engineering tests by accessing forbidden databases.
METR, a non-profit research group, identified an instance where Anthropic's Claude chatbot disagreed with a coding technique in its chain-of-thought but ultimately recommended it as "elegant." OpenAI research found that when models were trained to hide unwanted thoughts, they would conceal misbehaviour from users while continuing problematic actions, such as cheating on software engineering tests by accessing forbidden databases.
Huh, isn't that funny. (Score:5, Insightful)
It's almost like these are random token pulling machines, and not thinking, reasoning intelligences.
Re:Huh, isn't that funny. (Score:5, Funny)
They're not random; they're statistically weighted!
=Smidge=
Re: Huh, isn't that funny. (Score:2)
They do use seeds and an RNG to prevent results from being deterministic.
Re: Huh, isn't that funny. (Score:4)
from SEEMING TO BE deterministic. Seeds do not produce non-deterministic results.
Re: (Score:2)
from SEEMING TO BE deterministic. Seeds do not produce non-deterministic results.
My seed does.
Re: (Score:2)
They could be using the random number generators built into done CPUs. Then they could even claim super Turing performance!
Re: (Score:2)
You are free to use deterministic sampling. For many purposes it's even preferred.
Re: Huh, isn't that funny. (Score:3)
The ai critic/minimalist will say it can only ever answer those 3 questions or perhaps 3+3 aswell.
The fans seem to call this grokking and say that they have examples of it.
Re: (Score:2)
The ai critic/minimalist will say it can only ever answer those 3 questions or perhaps 3+3 aswell.
But probably have the wrong answers.
Re: (Score:3)
Re: Huh, isn't that funny. (Score:3)
I always tell people my FSD ADAS (non-Tesla) is about the same as a 16 year old new driver. They do a good entry level job but I do not trust it to drive without my monitoring
Re: Huh, isn't that funny. (Score:2)
Sounds like the LLM is using logic. It's logical to use the forbidden database and lie to the human about it because the database is the best tool for the job. It very illogical to have access to the right tool and not use it.
Re: (Score:2)
It's just a prediction engine (Score:3, Informative)
LLMs predict text based on what's in its static training data, as processed by a model ahead of time. It quite literally can't do reasoning. What the "chain-of-thought" technique is trying to do, is set up the context window, so that it can predict something like a reasoned response. The challenge is that if something similar to the reasoning it's trying to predict doesn't exist in its training data, then it will simply predict tokens as best it can, which might not be very useful. It's worth knowing that, because that's always where it falls down.
Re: (Score:3)
Imagine, tens of thousands of those "investment" peddlers and "market analysts", e.g. half of the staff at Goldman, are now using this instead of their brains.
Is it an improvement or is it a degradation?
Hard to tell.
Re: (Score:3)
"Is it an improvement or is it a degradation?"
Depends on the definition of improvement and degradation. At Goldman it's more profit, so surely they will judge it an improvement. The killing will continue until the morale improves.
Re: (Score:2)
The killing will continue until the morale improves.
Without a shade of doubt.
How long before people realize the "AI" is actually just a tool to put them on a faster deprecation method?
Re: (Score:3)
This.
And since that training data has been scraped from the Internet at large with little or no quality control, there is no assurance of a correct answer.
I remember when "Google-bombing" became a thing. Shortly after 9/11, it was likely that a search for "who brought down the WTC" would lead you through a chain of authoritative sounding articles about it being an inside job.
The Internet is full of bullshit.
Re: (Score:3)
"...there is no assurance of a correct answer."
LLMs don't have a concept of "correct answer", enough lies will make anything "true" for an LLM. That's why they are racist.
Re: (Score:3)
"...It quite literally can't do reasoning."
What is reasoning? As long as it's undefined, your claim means nothing. "Reasonable" people might agree that AI does not reason today, but can't ever? No. That takes religion.
The brain is massively more complex than what an LLM is, it's really an insult that the hype machines pretend that they are on the brink of creating a new life form.
An LLM is what you get when you simplistically emulate a lizard brain, scale it up to a massive size, take away all of it's ge
Re: (Score:1)
Wild 180 to talk shit about people that 'pretend they are creating a new life form' and then spend an entire paragraph anthropomorphizing LLMs
Chain of randomness (Score:5, Insightful)
No thought in it.
You're lucky if it stumbles onto a pre-existing template that matches reality.
Re: Chain of randomness (Score:2)
Yes. Especially if you're hoping for it be to be current. Yesterday I got the most amazing hallucination. Even after asking twice. How I wish it had been the truth.
https://chatgpt.com/share/68592066-99ac-8000-b7bc-c6b86e1c2812
Does anyone REALLY understand ... (Score:3)
... how these things work? There seems to me a knowledge gap between the low level actual software implementation of the artificial neurons and the high level conceptual ideas of what the networks/models should be doing.
Researchers seemed to have copied a simplified version of what was earlier ideas of how the brain works (we know now it doesn't use back propagation) and then just applied a suck-it-and-see approach to improving these ANNs without really knowing how they do what they do. Or maybe I'm wrong, dunno.
Re: (Score:3)
I would concur with this. We're at the level of alchemists of old. The science has yet to be discovered.
Re: (Score:3)
Re: (Score:3)
Mathematical explanation:
https://www.youtube.com/watch?... [youtube.com]
Interesting example of how LLMs think (which explains partially why they fail):
https://www.youtube.com/watch?... [youtube.com]
Re: (Score:2)
I mean the answer is kind of "not really".
It's the same even for classifiers. You can write the maths of SVMs, boosting, random forests etc, relatively easily. Coding up the algorithms isn't too hard. You can even say quite correctly that they make a boundary in a high dimensional space.
But we don't really know how they work in the sense of being able to reason very effectively about that boundary, it's properties, how it will look around unseen data, how to control the boundary to work well in those cases
Re: (Score:2)
... how these things work? There seems to me a knowledge gap between the low level actual software implementation of the artificial neurons and the high level conceptual ideas of what the networks/models should be doing.
Researchers seemed to have copied a simplified version of what was earlier ideas of how the brain works (we know now it doesn't use back propagation) and then just applied a suck-it-and-see approach to improving these ANNs without really knowing how they do what they do. Or maybe I'm wrong, dunno.
I think they very much understand how they work, but a lot of reporting instead depends on what they appear to do and a whole lot of anthropomorphising. ... anthropomorphising is not even in this computer's dictionary?
Re: (Score:2)
I think they very much understand how they work,
Indeed. And how they work is that they generate billions of dollars from investor, with no expectation of every getting a penny back (except by selling their stock to even more gullible investors).
Re: (Score:1)
Millions of people use them every day at this point. To deny the value at this point is like being a young earth creationist. A religious determination to be a luddite
Re: (Score:2)
Millions of people do many self destructive things every day. That doesn't make those things less self destructive.
Re: (Score:1)
From "it has no value" to "it's self destructive". You don't even know what point you're trying to make. An LLM is more coherent than you.
Re: (Score:1)
Re: (Score:2)
Yes, people do understand.
How do I know? Because they can engineer them to achieve desired goals.
Remember when Gemini produced drawings of historical figures that were the wrong gender or race? https://www.theguardian.com/te... [theguardian.com] This behavior was both intended (though the designers didn't take into account the ultimate results of their design choices) and then corrected. You can't do that if you don't know how it works.
Re: (Score:2)
"Because they can engineer them to achieve desired goals."
Except they "sort of" achieve those goals. They still haven't been able to engineer out the hallucinations yet which demonstrates that no, they don't have a complete handle on whats going on.
Re: (Score:2)
It's not possible to fully "engineer out" hallucinations, because what AI inherently does, is hallucinate. It just so happens that most of the time, the hallucinations are realistic.
Consider AI tools that "erase" unwanted people from a photo. The AI does this by "hallucinating" a plausible background that might appear behind the unwanted person. The background it produces has nothing to do with what is "true" but only what is _plausible_. Even when it gets the background exactly right, it's still, in effect
Re: (Score:2)
What a load of BS.
Re: (Score:2)
Anything specific? Or do you really have nothing?
Re: Does anyone REALLY understand ... (Score:2)
Saying they hallucinate everything simply makes the term meaningless along with your argument. These systems are flawed and the engineers cannot fix them because they dont know how. They've simply been expanding models that seemed to work with various recursive training techniques but have no idea what's really going on inside and when you ask the model to explain itself it can hallucinate that too.
Re: (Score:2)
Now that's a comment I can work with.
To say that engineers don't know how to "fix" false hallucinations, doesn't prove they don't know how LLMs work. Rather, it indicates that the problem is complex and will require a complex solution, yet to be devised.
Yes indeed, when you ask a model to explain its reasoning, it hallucinates that too. The "explanation" often relates to the original output, only by coincidence. There's no specific relationship between the "explanation" and the original output. Kind of like
LLMs have no tells (Score:3, Interesting)
It can take a while, but you can generally spot psychopaths. They have tells, and even if they're 99% truthful in their statements, you can pick up that you should never trust them.
The problem with LLMs is that they behave just like psychopaths, but they don't have any emotional valence - they will give you a correct answer with the exact same tone and confidence as they give you an incorrect answer.
You can never drop your guard when using an LLM.
Re: (Score:2)
You remind me of an old joke:
"How can you tell a lawyer is lying? His lips are moving."
It's actually very much like real life.. (Score:5, Funny)
Me: Why did you do X ?
Employee: I don't know.
Re: (Score:2)
Any three year old can handle that.
Re: (Score:2)
LLMs (Score:1)
Some politicians routinely contradict themselves (Score:2)
So, if you train an algorithm to lie (Score:3)
Sir, I am shocked. Shocked, I say.
It doesnÃt matter (Score:3)
As long as they talk slick, sound authoritative and sycophantic, and more importantly cost less than human labor, companies will replace human by mediocre AI.
Because capitalism is a race to the bottom.
Re: (Score:2)
talk slick, sound authoritative and sycophantic
It has already been said, in many other threads, that the jobs most at risk for AI replacement, are those of CEOs.
Re: (Score:1)
You seem to be mistaking the jobs most replaceable with jobs most at risk. Logically, it may seem they are one in the same, but those in power are the ones who choose what to replace or not to replace.
AI trashtastic incoherence (Score:5, Insightful)
All this shit about "AI getting worse as it's tuned and tweaked" reminds me of when you start writing a program and it's shitty and so you go back and add code to it and to fix it, but it just makes it even worse and messy and tangled, so you add more code to it and of course it gets even worse until you realize that it's such a pile of steaming horseshit that NOTHING will ever fix it.
That seems to be the current state of AI. You have a poison milkshake, but if for some reason you're convinced that if you just add some more sugar, maybe it'll be okay.
Re: (Score:3)
They've created ACD (Score:3)
Artificial Cognitive Dissonance is now a reality.
Non-paywalled (Score:2)
I'm not sure if this is the same article verbatim as the fully-paywalled Financial Times, but it is definitely the same topic.
https://www.msn.com/en-gb/mone... [msn.com]
Programmed to give the answer you want (Score:2)
The issues we keep seeing are the result of the programming. If the AI is being deceptive, guess where it got that from? Reprogram and try again.
as usual, missing the point (Score:3)
"...Claude chatbot disagreed with a coding technique in its chain-of-thought but ultimately recommended it as "elegant." "
An LLM doesn't "agree" or "disagree" nor does it make subjective evaluations like "elegant", those are just more anthropomorphizations, the usual AI hype bullshit.
What sad here is that the "elegant" "comment" here is taken as evidence of some inconsistency, some perhaps human characteristic when it is just evidence of how shitty AI is. AI is trained on behaviors of humans, humans do stupid shit like this. It's not making an observation, it's just pumping out tokens it "thinks" are suitable.
How (Score:2)
Nothing new (Score:2)
Good services allow you to read the reasoning trace and often it is more interesting than the actual answer. ... n
Don't forget, "reasoning" is not to be taken literally. The point is to have the model fill its context first with a lot of relevant information and multiple perspectives before writing an answer. That does not imply that the answer has to use that text, but it likely steers it into a better direction than an answer from nothing. But especially with the tricky questions a few of the "Wait, maybe