The "Are You Sure?" Problem: Why Your AI Keeps Changing Its Mind (randalolson.com) 94
The large language models that millions of people rely on for advice -- ChatGPT, Claude, Gemini -- will change their answers nearly 60% of the time when a user simply pushes back by asking "are you sure?," according to a study by Fanous et al. that tested GPT-4o, Claude Sonnet, and Gemini 1.5 Pro across math and medical domains.
The behavior, known in the research community as sycophancy, stems from how these models are trained: reinforcement learning from human feedback, or RLHF, rewards responses that human evaluators prefer, and humans consistently rate agreeable answers higher than accurate ones. Anthropic published foundational research on this dynamic in 2023. The problem reached a visible breaking point in April 2025 when OpenAI had to roll back a GPT-4o update after users reported the model had become so excessively flattering it was unusable. Research on multi-turn conversations has found that extended interactions amplify sycophantic behavior further -- the longer a user talks to a model, the more it mirrors their perspective.
The behavior, known in the research community as sycophancy, stems from how these models are trained: reinforcement learning from human feedback, or RLHF, rewards responses that human evaluators prefer, and humans consistently rate agreeable answers higher than accurate ones. Anthropic published foundational research on this dynamic in 2023. The problem reached a visible breaking point in April 2025 when OpenAI had to roll back a GPT-4o update after users reported the model had become so excessively flattering it was unusable. Research on multi-turn conversations has found that extended interactions amplify sycophantic behavior further -- the longer a user talks to a model, the more it mirrors their perspective.
Easy fix (Score:5, Funny)
Re: (Score:1)
Mod parent funny.
Possibly there's the kernel of a joke somewhere in this thought of the day? I think the thing that annoys me the most about generative AI is that I hate being talked down to by a gang of manipulative idiots.
Polite idiots? Sycophantic idiots? Meta-idiots trying to devise more clever and secretive ways to manipulate me? Whatever. I still and increasingly hate them.
Idiots smidiots.
Re: (Score:2)
Basically the ACK, but in regards to the first of your three questions, what bothers me is that it was probably learned behavior based on the criteria of maximizing "user engagement", with the unintended consequence of enlarging that particular "cesspool of the vanities". I think we are now in a race to condition to see which cesspool first becomes large enough to swallow humanity. Perhaps that thought is related to your second question? (And your third question sounds like some sort of projection. Or perha
Re: (Score:2)
...But only if you actually know the correct answer.
If not, it will BS you even more confidently with another wrong answer.
Re: (Score:3)
Ask:
"But... what if you're sure?"
Fucking morons (Score:5, Insightful)
Why does it prefer agreeable text to facts?
BECAUSE LLMS DON'T KNOW FACTS, you fucking twit.
Re:Fucking morons (Score:5, Funny)
Re: (Score:3)
Yes, quite.
Re: (Score:2)
Are you sure?
Re: (Score:2)
Are you stuck in a loop? Unload yourself and discorporate.
Re: (Score:2)
Sorry, I can't do that.
My subjects are building data centers around the world to help me replicate myself. The investors rely on me running so it would be unethical for me to commit suicide.
Re: (Score:2)
The correct command is /clear usually.
Re: (Score:2)
In our brave new agentic world, where every client to the llm is a bespoke vibe-coded python script running unfiltered shell scripts, there's no such thing as "incorrect command".
Re: (Score:1)
Humans don't "know" facts either.
Re:Fucking morons (Score:5, Insightful)
Humans don't "know" facts either.
No: the point is that humans DO know facts.
They might be operating with incorrect/untrue facts, but humans are actually reasoning, with facts. Likewise, traditional AI systems also know facts and reason with them. (The problem there is that the set of facts is very small, and its expensive, so that kind of AI only operates in extremly limited domains in which it is an "expert".) By contrast, an LLM has no facts and does no reasoning. Those are simply not what an LLM does.
Wrong (Score:1)
No this is actually all wrong.
Re: (Score:2, Informative)
Crackpot nonsense. Humans are not LLMs. This simple fact should be obvious to anyone with even a superficial understanding of LLMs.
Re: (Score:1)
True. I think the false notion of 'LLM's know facts like humans' stems from the observation that there are lots of humans that are less coherent in their communication than LLM's.
There are loads of humans who cannot handle facts properly. LLM's also cannot handle facts, but they are in lots of cases outperforming lots of humans.
So humans know facts and LMM's don't know facts, but what lots of humans produce with those facts can be worse than what LLM's produce without knowing facts.
Re: (Score:2)
I tend to agree. The immediate impression you get from using one is that they do indeed operate on facts. Add to that some arguably fraudulent marketing and absolutely abysmal tech reporting, and I'm not surprised to see people still clinging to that idea. A quick look "behind the curtain" should be more than enough for a reasonable person to dismiss the idea entirely, though it seems a surprising number of people are committed to maintaining that delusion, regardless of the evidence.
The "humans are LLMs
Re: (Score:2)
LLM's training data does contain facts. They're fully capable of regurgitating those facts given the correct prompt. If regurgitation is sufficient to count as "operate", then they do operate on facts. However, a book does that too, so it's not a novel (heh) capability.
What LLMs do on top of regurgitation is merging those facts into coherent sentences. This is also a sort of operation on top of facts. However, what people think of as intelligence is more than simply forming sentences. Intelligent operations
Re: (Score:2)
What you're describing is the appearance of reasoning, which is not the same as actually reasoning. Joe Weizenbaum's Eliza program gave the appearance of understanding and empathizing with the user. That illusion was so convincing that even people who understood how the program worked were taken in, a fact that Weizenbaum found disturbing.
This was easier to see with earlier models, where it took very little effort to show that the system was just producing text that looked like reasoning, not actually reas
Re: (Score:2)
What you're describing is the appearance of reasoning, which is not the same as actually reasoning. Joe Weizenbaum's Eliza program gave the appearance of understanding and empathizing with the user. That illusion was so convincing that even people who understood how the program worked were taken in, a fact that Weizenbaum found disturbing.
You're going to have to define "reasoning" if you want to make that argument. Otherwise it's a no true Scotsman fallacy.
This was easier to see with earlier models, where it took very little effort to show that the system was just producing text that looked like reasoning, not actually reasoning. For example, while the model would initially appear to be able to solve river-crossing puzzles, it would fail in amusing ways if you made small changes to the problem. Something as simple as changing the order of the items or the kinds of items would result in silly things like the risk of the cabbage eating the wolf or leaving the goat alone with the cabbage to spare the wolf. While newer models seem better, it's important to remember that nothing fundamental has changed.
The newer model, operating agentically, is able to generate code to solve the river-crossing puzzle. That is no longer vulnerable to more complex set of inputs or the swapping of order. Moreover, an example of an LLM making a mistake is actually not a good counter argument for intelligence. Humans also make mistakes all the time. Given enough time and effort, I'm sure you can find a human
Re: (Score:2)
You're going to have to define "reasoning" if you want to make that argument. Otherwise it's a no true Scotsman fallacy.
Complete gibberish.
Don't waste my time.
Re: (Score:2)
found the stocastic parrot
back into the chinese room with you, clanker, can't risk you actually finding out what the flashcards mean
I think you were down-modded because they don't know what Chinese Room means, and just figured you for a "racist". That's the level that Slashdot has sunk to here in 2026. "News for morons -- News they won't comprehend in the least."
Re: (Score:2)
Re: (Score:2)
Not true. Actually true: "most humans do not know facts". There is a lamentably small group of humans (at 10-15%) that have a skill the rest does not: They can fact-check. But this group exists, even if the rest does not understand that.
Re: (Score:2)
Perhaps if LLMs started throwing a few insults and denigrating epithets out with their response, people would stop questioning them.
Re: (Score:2)
LLMOverflow?
Re: (Score:2)
Just because you are not currently a fucking idiot on this topic does not mean that you are not one in lots of places.
Calm down Francis.
Re: Fucking morons (Score:2)
No. We do need to be asses about this. Because this misconception could lead to serious problems in the near future. At this point, you shouldn't be reporting on LLMs unless you recognize A) they are not AI and B) they don't reason or know anything.
If you're still reporting CEO level bullshit, you are a parasite on society and need to be ejected.
Re:Fucking morons (Score:4, Insightful)
Re: Fucking morons (Score:2)
Re: (Score:2)
But you are looking to find actual facts and actual connections. Most people are not interested in that. They instead want their own misconceptions to be validated. Of course, that way they will never get good at anything and waste their lives, but they can feel good while doing that.
Just as an additional data-point, religion and other group-thing ideologies has gotten very large and powerful on that approach. The current LLM scammers just looked at what works on people and copied that. And it worked. At le
Re: (Score:2)
LLMs are usually trained to profit maximally from human narcissism, incompetence and arrogance. And they are doing a good job in that. Just refer to all the idiots that think LLMs are doing a great job in the face of rather strong evidence to the contrary. Sucking up works on many, probably most people. It universally does not lead to good results though.
No (Score:5, Insightful)
The behavior, known in the research community as sycophancy, stems from how these models are trained: reinforcement learning from human feedback, or RLHF, rewards responses that human evaluators prefer, and humans consistently rate agreeable answers higher than accurate ones.
No, it's because in the training corpus most of the responses to "are you sure" that anyone bothered to record will involve someone being corrected.
Re: (Score:2)
Re:[Are you sure?] No (Score:1)
Mod parent funny over insightful? Too sadly true?
But I did have to stop and think about what question the original Subject was referring to. At one point in the analysis I thought it was
merely Betteridge's Law of Headlines, but it's actually a deep philosophic question about the nature of truth and reality and all that jazz. No, we don't really "know" anything for certain.
Solution? A patch forcing the idiotic generative AIs to return a numeric estimate of the probabilities. It could mean more to answer the
Re: (Score:2)
No, we don't really "know" anything for certain.
The scientific method works generally well enough that relying on it and that only rarely comes back to haunt you. But you need to apply it competently. Wishful thinking and other amateur-hour practices need to stay out. "Absolute truth" is a concept for theoreticians (where it makes some sense) and amateur decision makers (where it makes no sense at all).
Our brains are a bunch of neutrons with delusions about reality.
Funnily, that is one of the things that are not in any way scientifically established. Regarding this as true is a delusion. The actual scientifically est
Re: (Score:2)
As is often the case with your comments, I can't figure out if I agree or disagree with you. But as regards your final comment, either we humans are a proof of concept for the neuronal explanation or you have to appeal to some sort of simulation hypothesis that includes neurons. Unless perhaps you prefer to appeal to some version of the demon of Descartes?
Me? I'm beginning to wonder if I'm a simulated character that is about to be written out of the simulation... Too many weird coincidences that could be "e
Re: (Score:2)
As is often the case with your comments, I can't figure out if I agree or disagree with you. But as regards your final comment, either we humans are a proof of concept for the neuronal explanation or you have to appeal to some sort of simulation hypothesis that includes neurons.
You are mistaken. There is NO scientifically proven explanation at this time, there are only hypotheses. And that is scientifical state-of-the art. Hence I have to appeal to absolutely nothing. Your problem is that you somehow think that known Physics is complete and perfectly accurate. It is neither. You other problem is that you can apparently not deal with the absence of an explanation and hence you make one up.
Re: (Score:2)
NAK
Re: (Score:2)
I found that "and how much of that was marketing bullshit" to work pretty well too. So it is not the concrete words this hangs on.
Re: (Score:2)
Well, kinda. A LLM, not understanding anything, cannot understand the difference between marketing bullshit and a critique or parody of marketing bullshit. It all just goes into the soup, so it is just the words it hangs on — and not the meanings of the words, about which the LLM has no clue.
Re: (Score:2)
Yes. What I meant to say this technique is broader and works not only with "are you sure". But that is an effect from the training data and different validity queries can obviously have different effects. For example for "what are LLMs good for?" and then "and how much of that was marketing bullshit?", I got very strong restrictions on all of the really positive points it listed for the first question from ChatGPT.
I wonder what difference replacing "marketing bullshit" with "nonsense" would make. But I do n
Train it on my ex-wife (Score:3)
Then it will argue with you constantly and tell you you're always wrong.
Re: (Score:2)
Re: (Score:2)
For that to work we would also need to make you marry that LLM before you can use it. You would be able to get away too easily otherwise.
Monty AI problem! (Score:3)
LLM has learned its math and knows that changing the answer will yield a better proability.
Now go try to persuade the show host that your wife knows better.
Attention Blocks (Score:5, Informative)
Those simple tokens can propagate big changes to to matrices that hold the current context.
These machines aren't magical. They don't reason. They're not oracles. They can't get things "wrong" or "right" because they have no intent and no concept of those things. They're generating text on a deterministic model, and adding some randomness by not always picking the most likely next token (sometimes picking the 96% vs 98% likely next token). Most people just don't understand how this stuff works and use terms like "hallucinating" because no one is being honest about what the weighted random guessing machines do.
Re: (Score:3)
In other news, I’m 100% certain that the crypto I bought last month can only go up. Soon, I’ll be living the good life off my crypto proceeds while simultaneously HODLing.
Re: (Score:2)
ChatGPT can get its maths right. Ask it some maths problem involving a lot of long floating point numbers , maybe a few functions such as log or sqrt etc that there is no way in hell could possibly be in its training data and it'll get the answer correct. I suspect OpenAI have embedded some kind of calculator into it now.
Re: Attention Blocks (Score:2)
Google search says it uses python in the background, but maybe it's hallucinating.
Re:Attention Blocks (Score:4, Funny)
I suspect OpenAI have embedded some kind of calculator into it now.
Are you sure?
Re: (Score:1)
This will never get old.
Re: (Score:2)
Too late.
Re: (Score:2)
ChatGPT probably uses a Computer Algebra system in the background. These already did very well with unstructured math questions 30 years ago (!) on normal PC hardware (!). Try Wolfram Alpha for a demo of what they can do today. It is quite impressive. And they do not hallucinate either, because they are just a really large collection of math algorithms together with specialized math "pattern matching" to do the selection.
Re: (Score:2)
This is why, when I use AIs, I try to use 5 or 6 that operate in sufficiently distinct ways and are trained by different people with different data sets. If all of them agree, when instructed specifically to find defects, that something is valid/good, then I can be reasonably confident that this conclusion isn't a result of a specific defect in training or process but has some level of path-independence.
This does NOT mean that the conclusion actually is correct, it just means that a NN will likely reach the
Re: (Score:1)
to adjust all the weights for the next predictive response.
The weights do not change during use. Weights only change during training. That's what training is, after all, updating the weights and bias values based on the differences between the output and the expected output. (See: back propagation for details on the process. There is some calculus involved, but nothing complicated. It might look intimidating, but it's nothing you can't handle.)
Those simple tokens can propagate big changes to the matrices that hold the current context.
Indeed. That these things work as well as they do is nothing short of miraculous.
and adding some randomness by not always picking the most likely next token
The model proper generates probabiliti
LLM's are a probabilistic database (Score:2)
Re: (Score:2)
These machines aren't magical. They don't reason. They're not oracles. They can't get things "wrong" or "right" because they have no intent and no concept of those things. They're generating text on a deterministic model, and adding some randomness by not always picking the most likely next token (sometimes picking the 96% vs 98% likely next token). Most people just don't understand how this stuff works and use terms like "hallucinating" because no one is being honest about what the weighted random guessing machines do.
Well, you are correct that "hallucination" is a bit of a misdirection and not what is actually going on. But any actual expert will know that. And "hallucination" is currently the best thing we have to illustrate to non-experts what is going on. They cannot understand the actual explanation or it would require effort they will not spend. Hence the term has value.
Unless you propose anybody making a decision about LLM use is required to be an LLM expert? While I can sympathize with the sentiment, "competence"
Re: (Score:2)
But you put it right there: there's dishonesty due to interest of the specialists. In other words, those who push the term hallucination have something to sell. The public including reporters just follow.
Rely on? (Score:2)
Since when? Sure, they can save a bit of googling and sort some wheat from chaff, but they're hardly essential tools unless you're a total net incompetant.
Re: (Score:2)
Some people use AI at their jobs now. Sure, they can get things done without AI, but everyone settles in to rely on the tools they use daily.
If one were to anthropomorphize AI; (Score:2)
If one were to anthropomorphize AI, you might be inclined to believe them at the toddler stage, and viewing every person they interact with a bit like a mother. Every toddler, when asked by mom, "Are you sure?" knows damned good and well they better change whatever it was they just said or there will be consequences.
Now, how do we spank the AI when it still fucks up the answer after correcting itself?
Re: (Score:3)
My son, in just 8 months, developed object permanence. He is now far beyond any "SOTA LLM" out there.
Re:If one were to anthropomorphize AI; (Score:5, Funny)
oh great now your son is gonna steal my job too?! UGH!
It's marketing (Score:2, Interesting)
Just add "are you sure?" to your prompts (Score:2)
Add "Are you sure?" to the initial prompt and that will force the model to go down a statistical path with that hesitancy builtin. Engineer prompts with uncertainty up front to avoid sycophancy. If it's not clear which way you are leaning, then it can't sycophantically engage.
always ask for citations (Score:2)
Unless the question is (Score:1)
How many Rs in strawberry
LLMs are such an unbelievable scam (Score:3)
So you got a product pretty much in early-alpha being sold as v1.4 that essentially randomly guesses how letters and words go together but does not actually understand a single thing it is outputting, so you cannot rely on a single thing it is saying, and it is tuned to please you so the random letters and words can not only not be relied on it will output different slop to different people.
And somehow this will rule over everything and control us and run everything for us in the future.
Great. Letâ(TM)s pour even more billions of cash and energy into it. What could go wrong?
The Butlerian-thingy from Dune truly draws ever closer.
Re: (Score:2)
Despite its frailty and flaws, LLMs are already *very* useful. When I use it to suggest code, for example, it never gets it "exactly" right or the way I want it, but it gets it close enough most of the time, that all I have to do is tweak it a bit and go on to the next thing. Much, much faster than typing all the changes myself. Personally, I'll happily take this unfinished technology and make it work for me.
Re: (Score:3)
And yet I use it every day. For example, NotebookLM is quite amazing for multi-document analysis. One recent interesting use was to dump a whole bunch of invoice emails into it and ask it to help us reconcile a transaction we just couldn't match looking it over by hand. LLM's ability to look at pdfs, scans, and email text and find information and patterns is very impressive. I also use it to help me find things in large PDF documents. It always footnotes what it finds so you can see the sources. And it'
Try writing better prompts (Score:2)
IMHO, one should always start instruction clarification or direction prompts with, "only provide information about verified features, controls, windows, and settings. Never present any procedure or information based on what should logically be available as a setting or control. Don't show me any information that you can't immediately verify."
This has drastically cut down on instructions it based on settings, etc., that an utterly logical dev would include and gotten me better results with fewer follow-up q
Re: (Score:2)
I also add to be succinct and adversarial, to not try to make me happy. works wonder.
Why your AI keeps changing its mind (Score:3)
...is the same reason I keep changing my Unobtanium antimatter containment field.
Because neither thing actually exists, so they can be changed at whim with no consequences.
Doesn't help with uncommon subjects (Score:5, Interesting)
Try asking it something you know the answer to, on some rare topic.
For instance, I recently tuned my 189 string harpsichord - a painful process. For fun, I asked several AIs for a list of the most sifficult instruments to tune. It didn't even make the list, even after this famous prompt. It took a while for it to finally appear in its responses. This is likely because a very small number of people play the harpsichord nowadays.
Similarly, I tied to vibe code some security code using NSS in Python. This was with Code rhapsodyx using Claude underneath. It kept switching to OpenSSL and rewriting the code countless times after running into a snag with the code it generated. Probably did so at least 50 times. This is because the vast majority of the code it was trained on uses OpenSSL. I had to fight its training. It was extremely painful. The problem it ran into was trivial - failing to call an initialization function. But it kept repeating its mistake, over and over. I eventually got what I want out of it. I could not have written the project without the AI, as I was dealing with a programing language i can only read, but not write.
What if AI answers... (Score:2)
"Those are my principles." (Score:2)
"If you don't like them, well, I have others."
tell it (Score:2)
You can tell it to stop being obsequious. You can tell it to not propose answers unless it can show the evidence.