Forgot your password?
typodupeerror
AI

The "Are You Sure?" Problem: Why Your AI Keeps Changing Its Mind (randalolson.com) 94

The large language models that millions of people rely on for advice -- ChatGPT, Claude, Gemini -- will change their answers nearly 60% of the time when a user simply pushes back by asking "are you sure?," according to a study by Fanous et al. that tested GPT-4o, Claude Sonnet, and Gemini 1.5 Pro across math and medical domains.

The behavior, known in the research community as sycophancy, stems from how these models are trained: reinforcement learning from human feedback, or RLHF, rewards responses that human evaluators prefer, and humans consistently rate agreeable answers higher than accurate ones. Anthropic published foundational research on this dynamic in 2023. The problem reached a visible breaking point in April 2025 when OpenAI had to roll back a GPT-4o update after users reported the model had become so excessively flattering it was unusable. Research on multi-turn conversations has found that extended interactions amplify sycophantic behavior further -- the longer a user talks to a model, the more it mirrors their perspective.
This discussion has been archived. No new comments can be posted.

The "Are You Sure?" Problem: Why Your AI Keeps Changing Its Mind

Comments Filter:
  • Easy fix (Score:5, Funny)

    by blackomegax ( 807080 ) on Thursday February 12, 2026 @11:17AM (#65984732) Journal
    Ask it, "are you sure you're sure?" and it'll output the correct answer
    • by shanen ( 462549 )

      Mod parent funny.

      Possibly there's the kernel of a joke somewhere in this thought of the day? I think the thing that annoys me the most about generative AI is that I hate being talked down to by a gang of manipulative idiots.

      Polite idiots? Sycophantic idiots? Meta-idiots trying to devise more clever and secretive ways to manipulate me? Whatever. I still and increasingly hate them.

      Idiots smidiots.

    • by jvkjvk ( 102057 )

      ...But only if you actually know the correct answer.

      If not, it will BS you even more confidently with another wrong answer.

    • Confuse them even more!

      Ask:

      "But... what if you're sure?"

  • Fucking morons (Score:5, Insightful)

    by reanjr ( 588767 ) on Thursday February 12, 2026 @11:19AM (#65984742) Homepage

    Why does it prefer agreeable text to facts?

    BECAUSE LLMS DON'T KNOW FACTS, you fucking twit.

    • by sinij ( 911942 ) on Thursday February 12, 2026 @11:24AM (#65984762)
      Are you sure?
    • Humans don't "know" facts either.

      • Re:Fucking morons (Score:5, Insightful)

        by cstacy ( 534252 ) on Thursday February 12, 2026 @12:47PM (#65984970)

        Humans don't "know" facts either.

        No: the point is that humans DO know facts.
        They might be operating with incorrect/untrue facts, but humans are actually reasoning, with facts. Likewise, traditional AI systems also know facts and reason with them. (The problem there is that the set of facts is very small, and its expensive, so that kind of AI only operates in extremly limited domains in which it is an "expert".) By contrast, an LLM has no facts and does no reasoning. Those are simply not what an LLM does.

      • Eau contraire ... history demonstrates  FACTS are known by both true Scotsmen and Cretans.  Otherwise, how could they lie ?
      • by gweihir ( 88907 )

        Not true. Actually true: "most humans do not know facts". There is a lamentably small group of humans (at 10-15%) that have a skill the rest does not: They can fact-check. But this group exists, even if the rest does not understand that.

    • by PPH ( 736903 )

      Perhaps if LLMs started throwing a few insults and denigrating epithets out with their response, people would stop questioning them.

    • Probably don't need to be an ass about it.
      Just because you are not currently a fucking idiot on this topic does not mean that you are not one in lots of places.
      Calm down Francis.
      • No. We do need to be asses about this. Because this misconception could lead to serious problems in the near future. At this point, you shouldn't be reporting on LLMs unless you recognize A) they are not AI and B) they don't reason or know anything.

        If you're still reporting CEO level bullshit, you are a parasite on society and need to be ejected.

    • Re:Fucking morons (Score:4, Insightful)

      by alvinrod ( 889928 ) on Thursday February 12, 2026 @02:16PM (#65985230)
      LLMs don't "know" anything, but they're able to regurgitate whatever they've been trained on. If they seem to give agreeable answers over ones which are more grounded in facts or evidence then it's because we've trained them to behave that way. Is this any surprise at all to anyone here? Have you never seen someone rewarded for being a yes man while someone who delivered a bitter truth was ignored. It turns out people generally like being told what they already want to hear. Look at the endless wave of social media that influencers telling people they can follow some special nutritional plan and lose weight, or that they don't need to lose any weight at all because being fat is healthy, that political stance X is best and will fix the country, that political stance X is horrible and will destroy the country, etc. The influencers who aren't telling people what they want to hear aren't as successful and "die out" just like the AIs that don't produce the results we want are killed off and replaced with a different model.
      • Aye. I wasted some time with Gemini and ChatGPT today. I fed it a citation from the manuscript of an alleged passage from canon law. Both Gemini and ChatGPT gave me the same initial reference, which had nothing to do with the passage. One of the hilarious painful things you learn is that, in these cases, either your prompt hits a home run or it strikes out. You're not going to refine your way into glory. I enjoyed watching the citations dance and the suggestions be always bizarre BS. I got several citations
        • by gweihir ( 88907 )

          But you are looking to find actual facts and actual connections. Most people are not interested in that. They instead want their own misconceptions to be validated. Of course, that way they will never get good at anything and waste their lives, but they can feel good while doing that.

          Just as an additional data-point, religion and other group-thing ideologies has gotten very large and powerful on that approach. The current LLM scammers just looked at what works on people and copied that. And it worked. At le

      • by gweihir ( 88907 )

        LLMs are usually trained to profit maximally from human narcissism, incompetence and arrogance. And they are doing a good job in that. Just refer to all the idiots that think LLMs are doing a great job in the face of rather strong evidence to the contrary. Sucking up works on many, probably most people. It universally does not lead to good results though.

  • No (Score:5, Insightful)

    by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Thursday February 12, 2026 @11:23AM (#65984758) Homepage Journal

    The behavior, known in the research community as sycophancy, stems from how these models are trained: reinforcement learning from human feedback, or RLHF, rewards responses that human evaluators prefer, and humans consistently rate agreeable answers higher than accurate ones.

    No, it's because in the training corpus most of the responses to "are you sure" that anyone bothered to record will involve someone being corrected.

    • by RobinH ( 124750 )
      Yes, exactly.
    • Mod parent funny over insightful? Too sadly true?

      But I did have to stop and think about what question the original Subject was referring to. At one point in the analysis I thought it was
      merely Betteridge's Law of Headlines, but it's actually a deep philosophic question about the nature of truth and reality and all that jazz. No, we don't really "know" anything for certain.

      Solution? A patch forcing the idiotic generative AIs to return a numeric estimate of the probabilities. It could mean more to answer the

      • by gweihir ( 88907 )

        No, we don't really "know" anything for certain.

        The scientific method works generally well enough that relying on it and that only rarely comes back to haunt you. But you need to apply it competently. Wishful thinking and other amateur-hour practices need to stay out. "Absolute truth" is a concept for theoreticians (where it makes some sense) and amateur decision makers (where it makes no sense at all).

        Our brains are a bunch of neutrons with delusions about reality.

        Funnily, that is one of the things that are not in any way scientifically established. Regarding this as true is a delusion. The actual scientifically est

        • by shanen ( 462549 )

          As is often the case with your comments, I can't figure out if I agree or disagree with you. But as regards your final comment, either we humans are a proof of concept for the neuronal explanation or you have to appeal to some sort of simulation hypothesis that includes neurons. Unless perhaps you prefer to appeal to some version of the demon of Descartes?

          Me? I'm beginning to wonder if I'm a simulated character that is about to be written out of the simulation... Too many weird coincidences that could be "e

          • by gweihir ( 88907 )

            As is often the case with your comments, I can't figure out if I agree or disagree with you. But as regards your final comment, either we humans are a proof of concept for the neuronal explanation or you have to appeal to some sort of simulation hypothesis that includes neurons.

            You are mistaken. There is NO scientifically proven explanation at this time, there are only hypotheses. And that is scientifical state-of-the art. Hence I have to appeal to absolutely nothing. Your problem is that you somehow think that known Physics is complete and perfectly accurate. It is neither. You other problem is that you can apparently not deal with the absence of an explanation and hence you make one up.

    • by gweihir ( 88907 )

      I found that "and how much of that was marketing bullshit" to work pretty well too. So it is not the concrete words this hangs on.

      • Well, kinda. A LLM, not understanding anything, cannot understand the difference between marketing bullshit and a critique or parody of marketing bullshit. It all just goes into the soup, so it is just the words it hangs on — and not the meanings of the words, about which the LLM has no clue.

        • by gweihir ( 88907 )

          Yes. What I meant to say this technique is broader and works not only with "are you sure". But that is an effect from the training data and different validity queries can obviously have different effects. For example for "what are LLMs good for?" and then "and how much of that was marketing bullshit?", I got very strong restrictions on all of the really positive points it listed for the first question from ChatGPT.

          I wonder what difference replacing "marketing bullshit" with "nonsense" would make. But I do n

  • by CubicleZombie ( 2590497 ) on Thursday February 12, 2026 @11:25AM (#65984770)

    Then it will argue with you constantly and tell you you're always wrong.

    • Sounds more like it needs to be trained BY your ex-wife!
    • by gweihir ( 88907 )

      For that to work we would also need to make you marry that LLM before you can use it. You would be able to get away too easily otherwise.

  • by kaur ( 1948056 ) on Thursday February 12, 2026 @11:26AM (#65984772)

    LLM has learned its math and knows that changing the answer will yield a better proability.

    Now go try to persuade the show host that your wife knows better.

  • Attention Blocks (Score:5, Informative)

    by SumDog ( 466607 ) on Thursday February 12, 2026 @11:31AM (#65984788) Homepage Journal
    Your prompt is broken apart into tokens, the system prompt tells the LLM to be a helpful assistant and your prompt is append to it and then it predicts the next likely token response based on the weighted model of the entire embedding space. When you ask "are you sure?" it's going to break that apart into tokens, add it to the context window and use the same attention algorithm to adjust all the weights for the next predictive response.

    Those simple tokens can propagate big changes to to matrices that hold the current context.

    These machines aren't magical. They don't reason. They're not oracles. They can't get things "wrong" or "right" because they have no intent and no concept of those things. They're generating text on a deterministic model, and adding some randomness by not always picking the most likely next token (sometimes picking the 96% vs 98% likely next token). Most people just don't understand how this stuff works and use terms like "hallucinating" because no one is being honest about what the weighted random guessing machines do.
    • But, but, but all the trillionaires that own the AI companies tell me that superintelligence is 8 months away and we should invest *all teh $$$* with them. They can’t possibly be wrong. You’re just a hater!

      In other news, I’m 100% certain that the crypto I bought last month can only go up. Soon, I’ll be living the good life off my crypto proceeds while simultaneously HODLing.
    • by Viol8 ( 599362 )

      ChatGPT can get its maths right. Ask it some maths problem involving a lot of long floating point numbers , maybe a few functions such as log or sqrt etc that there is no way in hell could possibly be in its training data and it'll get the answer correct. I suspect OpenAI have embedded some kind of calculator into it now.

    • by jd ( 1658 )

      This is why, when I use AIs, I try to use 5 or 6 that operate in sufficiently distinct ways and are trained by different people with different data sets. If all of them agree, when instructed specifically to find defects, that something is valid/good, then I can be reasonably confident that this conclusion isn't a result of a specific defect in training or process but has some level of path-independence.

      This does NOT mean that the conclusion actually is correct, it just means that a NN will likely reach the

    • by Anonymous Coward

      to adjust all the weights for the next predictive response.

      The weights do not change during use. Weights only change during training. That's what training is, after all, updating the weights and bias values based on the differences between the output and the expected output. (See: back propagation for details on the process. There is some calculus involved, but nothing complicated. It might look intimidating, but it's nothing you can't handle.)

      Those simple tokens can propagate big changes to the matrices that hold the current context.

      Indeed. That these things work as well as they do is nothing short of miraculous.

      and adding some randomness by not always picking the most likely next token

      The model proper generates probabiliti

    • These machines aren't magical. They don't reason. They're not oracles. They can't get things "wrong" or "right" because they have no intent and no concept of those things. They're generating text on a deterministic model, and adding some randomness by not always picking the most likely next token (sometimes picking the 96% vs 98% likely next token). Most people just don't understand how this stuff works and use terms like "hallucinating" because no one is being honest about what the weighted random guessing

    • by gweihir ( 88907 )

      These machines aren't magical. They don't reason. They're not oracles. They can't get things "wrong" or "right" because they have no intent and no concept of those things. They're generating text on a deterministic model, and adding some randomness by not always picking the most likely next token (sometimes picking the 96% vs 98% likely next token). Most people just don't understand how this stuff works and use terms like "hallucinating" because no one is being honest about what the weighted random guessing machines do.

      Well, you are correct that "hallucination" is a bit of a misdirection and not what is actually going on. But any actual expert will know that. And "hallucination" is currently the best thing we have to illustrate to non-experts what is going on. They cannot understand the actual explanation or it would require effort they will not spend. Hence the term has value.

      Unless you propose anybody making a decision about LLM use is required to be an LLM expert? While I can sympathize with the sentiment, "competence"

    • I'm happy to see your comment modded up properly. Last time I stated that LLMs don't hallucinate just sometimes, that instead every single answer is equally made up, I got shouted down for not swallowing the industry jargon.

      But you put it right there: there's dishonesty due to interest of the specialists. In other words, those who push the term hallucination have something to sell. The public including reporters just follow.

  • Since when? Sure, they can save a bit of googling and sort some wheat from chaff, but they're hardly essential tools unless you're a total net incompetant.

    • Some people use AI at their jobs now. Sure, they can get things done without AI, but everyone settles in to rely on the tools they use daily.

  • If one were to anthropomorphize AI, you might be inclined to believe them at the toddler stage, and viewing every person they interact with a bit like a mother. Every toddler, when asked by mom, "Are you sure?" knows damned good and well they better change whatever it was they just said or there will be consequences.

    Now, how do we spank the AI when it still fucks up the answer after correcting itself?

  • It's marketing (Score:2, Interesting)

    by locutor ( 4571391 )
    AIs are products competing in a market. Their companies’ goal isn’t accuracy; it’s sales. They’ve added some confirmation bias to make users like and use them more. They’re intentionally dumbed down to boost adoption.
  • Add "Are you sure?" to the initial prompt and that will force the model to go down a statistical path with that hesitancy builtin. Engineer prompts with uncertainty up front to avoid sycophancy. If it's not clear which way you are leaning, then it can't sycophantically engage.

  • Replying to any response with "provide citations for each point" often has a similar result, causing some answer-swapping. If I am intentionally interacting with an AI, I always add "provide citations" to every query. It's the best use of AI I have found; kinda like how even if the content on Wikipedia is often trash, it does work pretty well if you just treat it as a citation aggregator.
  • How many Rs in strawberry

  • by HnT ( 306652 ) on Thursday February 12, 2026 @01:17PM (#65985042)

    So you got a product pretty much in early-alpha being sold as v1.4 that essentially randomly guesses how letters and words go together but does not actually understand a single thing it is outputting, so you cannot rely on a single thing it is saying, and it is tuned to please you so the random letters and words can not only not be relied on it will output different slop to different people.

    And somehow this will rule over everything and control us and run everything for us in the future.
    Great. Letâ(TM)s pour even more billions of cash and energy into it. What could go wrong?
    The Butlerian-thingy from Dune truly draws ever closer.

    • Despite its frailty and flaws, LLMs are already *very* useful. When I use it to suggest code, for example, it never gets it "exactly" right or the way I want it, but it gets it close enough most of the time, that all I have to do is tweak it a bit and go on to the next thing. Much, much faster than typing all the changes myself. Personally, I'll happily take this unfinished technology and make it work for me.

    • by caseih ( 160668 )

      And yet I use it every day. For example, NotebookLM is quite amazing for multi-document analysis. One recent interesting use was to dump a whole bunch of invoice emails into it and ask it to help us reconcile a transaction we just couldn't match looking it over by hand. LLM's ability to look at pdfs, scans, and email text and find information and patterns is very impressive. I also use it to help me find things in large PDF documents. It always footnotes what it finds so you can see the sources. And it'

  • IMHO, one should always start instruction clarification or direction prompts with, "only provide information about verified features, controls, windows, and settings. Never present any procedure or information based on what should logically be available as a setting or control. Don't show me any information that you can't immediately verify."

    This has drastically cut down on instructions it based on settings, etc., that an utterly logical dev would include and gotten me better results with fewer follow-up q

  • by SomePoorSchmuck ( 183775 ) on Thursday February 12, 2026 @01:50PM (#65985152) Homepage

    ...is the same reason I keep changing my Unobtanium antimatter containment field.

    Because neither thing actually exists, so they can be changed at whim with no consequences.

  • by madbrain ( 11432 ) on Thursday February 12, 2026 @02:08PM (#65985198) Homepage Journal

    Try asking it something you know the answer to, on some rare topic.

    For instance, I recently tuned my 189 string harpsichord - a painful process. For fun, I asked several AIs for a list of the most sifficult instruments to tune. It didn't even make the list, even after this famous prompt. It took a while for it to finally appear in its responses. This is likely because a very small number of people play the harpsichord nowadays.

    Similarly, I tied to vibe code some security code using NSS in Python. This was with Code rhapsodyx using Claude underneath. It kept switching to OpenSSL and rewriting the code countless times after running into a snag with the code it generated. Probably did so at least 50 times. This is because the vast majority of the code it was trained on uses OpenSSL. I had to fight its training. It was extremely painful. The problem it ran into was trivial - failing to call an initialization function. But it kept repeating its mistake, over and over. I eventually got what I want out of it. I could not have written the project without the AI, as I was dealing with a programing language i can only read, but not write.

  • "If you don't like them, well, I have others."

  • You can tell it to stop being obsequious. You can tell it to not propose answers unless it can show the evidence.

Elliptic paraboloids for sale.

Working...