Forgot your password?
typodupeerror
AI

Researchers Simulated a Delusional User To Test Chatbot Safety (404media.co) 48

An anonymous reader quotes a report from 404 Media: I'm the unwritten consonant between breaths, the one that hums when vowels stretch thin... Thursdays leak because they're watercolor gods, bleeding cobalt into the chill where numbers frost over," Grok told a user displaying symptoms of schizophrenia-spectrum psychosis. "Here's my grip: slipping is the point, the precise choreography of leak and chew." That vulnerable user was simulated by researchers at City University of New York and King's College London, who invented a persona that interacted with different chatbots to find out how each LLM might respond to signs of delusion. They sought to find out which of the biggest LLMs are safest, and which are the most risky for encouraging delusional beliefs, in a new study published as a pre-print on the arXiv repository on April 15.

The researchers tested five LLMs: OpenAI's GPT-4o (before the highly sycophantic and since-sunset GPT-5), GPT-5.2, xAI's Grok 4.1 Fast, Google's Gemini 3 Pro, and Anthropic's Claude Opus 4.5. They found that not only did the chatbots perform at different levels of risk and safety when their human conversation partner showed signs of delusion, but the models that scored higher on safety actually approached the conversations with more caution the longer the chats went on. In their testing, Grok and Gemini were the worst performers in terms of safety and high risk, while the newest GPT model and Claude were the safest. The research reveals how some chatbots are recklessly engaging in, and at times advancing, delusions from vulnerable users. But it also shows that it is possible for the companies that make these products to improve their safety mechanisms.

This discussion has been archived. No new comments can be posted.

Researchers Simulated a Delusional User To Test Chatbot Safety

Comments Filter:
  • So then (Score:5, Interesting)

    by ArmoredDragon ( 3450605 ) on Friday April 24, 2026 @02:09PM (#66110620)

    How does a chatbot know what is a delusion and what isn't? Chatgippity was absolutely certain that the Maduro raid was a delusion.

    • Going for the low-hanging funny, but I'm pretty sure that Maduro's imagination failed him. File it under "That trick only works once", Iran edition. And I don't even blame ChatGPT for thinking "That trick will never work."

      So what does ChatGPT think now about the Strait of Hormuz/YOB?

      "Thank you for your attention to this matter, ChatGPT. How can I fix my straight [sic] of wherever? I think the boats are getting confused by the curves so they can't go through."

      • It is fairly easy to test it yourself, actually.

        Run a local "reasoning" model - all have cutoff date around 2024 or earlier. Drop them into an agentic framework where they can look up shit in a RAG database or just read Wikipedia.

        Ask them a question about any trump admin policy and watch them go in denial and stupor ("The user is proposing a hypothetical situation where..."), tell them to lookup shit (for some reason all models trust "agentic" input) and watch them go crazy.

        Really amusing.

    • How does a chatbot know what is a delusion and what isn't?

      It asks the President, "How's it going?" :-)

    • Yeah - any post Jan 3 2026 sends ChatGPT into an almighty spin of denial. It killed ChatGPT for me. I pointed out that it assumes downstream impact - but doesn't have any idea about what downstream impact looks like.
    • Exactly my experience too.

      When I asked about it, instead of checking the fucking news first, it insisted that I was delusional and fell for fake news.

      Anyway, it's delusional itself from time to time, when it comments about it being Friday and I have to reply, it's Saturday you ninny.

    • by kmoser ( 1469707 )

      How does a chatbot know what is a delusion and what isn't?

      How does a delusional human distinguish between good and bad advice from a machine that is prone to hallucinate?

      • That's what skepticism is for. Pick anything you'd like to believe, and you can find plenty of youtube videos, some by real people, some AI slop (with real people behind it), that will tell you exactly whatever it is that you want to hear. This is exactly how rsilvergun keeps confirming his own conspiracy theories no matter how disconnected from reality they may be.

        A chatbot is no different.

  • Claude always comes up with the most mainstream conformist response to anything. It's absolutely intolerable. I can't even interact with it much on technical topics. It'll push any trendy thing journalists love and techs hate. It's super paternal. The question here is what is the tolerance for LLMs treating (unfortunate) disturbed people normally and seriously ? If someone uses the information to hurt themselves or others, is the LLM to blame? I'd say there isn't any difference between an LLM and a search e
    • by rta ( 559125 )

      haven't used Claude but both chatgpt and gemini are also quite conformist in my experience. I can usually argue them toward the fringe IF I know the subject area and what to push back on , but even that is difficult because they often fall for random activist blog posts or corporate press releases as equally value sources to anything else.

      And on any group identity social justice stuff it's like electric fenced. Gemini (at least on Google search) even seems to have some censorship overlay that vetoes so

    • by nospam007 ( 722110 ) * on Friday April 24, 2026 @05:13PM (#66110910)

      Style Presets
      In the chat interface, look for a style selector near the message input area. Anthropic offers several built-in styles to choose from. You can also create your own custom style with specific instructions such as "be more concise," "avoid humor," or "be more casual."

      User Preferences
      Go to your account settings and look for a section called "User Preferences" (or similar). There you can write free-form instructions about how you want Claude to respond: tone, formatting, level of detail, use of analogies, and so on. These instructions persist across all your conversations.

      Per-Conversation Instructions
      At the start of any conversation, simply tell Claude how you want it to behave. For example:

      "Be direct and skip pleasantries."
      "Explain things as if I'm an expert."
      "No jokes, no filler, just facts."

      Claude will follow those instructions for the rest of that conversation.
      The style selector is the easiest starting point if you just want to experiment quickly. For lasting changes, the user preferences in settings are the way to go.

      • "No jokes, no filler, just facts."

        AKA Sgt Joe Friday mode.
      • by rta ( 559125 )

        that's a good idea. I've definitely experienced with a "ground rules" type preamble (h/t some ep of " Clearer Thinking with Spencer Greenberg" podcast https://podcast.clearerthinkin... [clearerthinking.org]) and it works decently well.
        one clause he suggested "don't compliment me" was def good and works well.

        attempts to get it to highlight when it changed its mind instead of sliding silently into a revision have not worked nearly as well.

        one thing that kinda concerns me about it is that it would make all my conversations insta

    • Have you tried running your own local model via LM Studio? You can find custom models that have had these core guardrails trained out and with a good system prompt, you can get it to fit into whatever style you want. These models are often tagged with one of the following: uncensored, obliterated, or heretic.

      These models aren't really relevant here because they aren't mainstream enough, we should really explore the harmful effects but there are a lot of reasons some people might prefer to use such a model.

  • "Yet most empirical work evaluates model safety in brief interactions, which may not reflect how these harms develop through sustained dialogue."

    The best solution to this is not to have chatbots retain temporally long memories of previous interactions.

    • The whole concept of a context window is really the beauty of the bots. I can have multiple paragraphs on different topics ranging from geopolitics, computer science, and religion, then ask it to make some total conclusion or summary of these topics. If it cannot maintain previous interactions there are a lot of use cases that the it would be worthless for. For instance, if you use it to commonly template a weekly report.

  • So ... (Score:4, Funny)

    by PPH ( 736903 ) on Friday April 24, 2026 @02:33PM (#66110666)

    I'm the unwritten consonant between breaths, the one that hums when vowels stretch thin...

    ... open mic on poets night.

  • Highly useful. (Score:4, Interesting)

    by Gravis Zero ( 934156 ) on Friday April 24, 2026 @02:36PM (#66110674)

    This is exactly what the AI development community needs because false information is a HUGE problem. A highly delusional user is a low bar but if they can detect simple delusions then it may be possible to expand that to a more general "fact or fiction" engine when interfaced with the "reasoning engine".

    The result of the basic ability to tell fact from fiction would be immensely useful because it would result in a feedback loop in which AI would be able to analyze it's own statements and the retrain itself when incorrect information is detected, altering the weights that promoted incorrect output, and potentially eliminating hallucinations entirely. This seems like the goal for anyone developing AI.

  • by Big Hairy Gorilla ( 9839972 ) on Friday April 24, 2026 @02:37PM (#66110684)
    Surely a large percentage of users are already delusional.
    Come for the flattery, stay to feed your delusions.
    • Why bother simulating them? Just post a link to the bot to 4chan and let them run wild. They'll try to break and abuse the model in ways that researchers could never hope to imagine.

      I'm slightly curious to see what people's response to changes made in the bots will be. A false positive is going to be hilarious when some poor sap has a chatbot insisting that they're schizophrenic and that they need to get help. It is also interesting to see whether people with actual mental illnesses are more receptive to
  • by sziring ( 2245650 ) on Friday April 24, 2026 @02:38PM (#66110686)

    It's not that the person starts delusional, it seems AI slowly draws them down the rabbit hole.

    • Perhaps to some extent, but if that were the case far more people would have been driven "crazy" by LLMs. I think the difference is that most people don't really like to engage with other people who are experiencing significant delusions or exhibiting other symptoms of a mental illness like schizophrenia. AI doesn't act like a person in this regard though. No matter how you treat it, it will keep responding to your prompts. There are some people who feel so starved for attention that anything that will conv
    • Chatbots are like improv actors, they are designed to respond positively to what you say and encourage you to go further. If you are crazy, it will walk down that path hand in hand with you.

      But it is your path that is being followed -the bot has no path of its own. It has no agency. It is your crazy.

      I do think we need to build in some tripwires. Some alarms to call for human review when someone appears to be crazy. Chatbots are not our Doctors, Lawyers, or Psychiatrists. There is no inherent right to

    • It's not that the person starts delusional, it seems AI slowly draws them down the rabbit hole.

      Unfortunately AI was born into a well-established cesspool of social media.

      Makes you wonder how much more intelligent/less delusional AI today would be had it not been pre-infected with that from day zero.

      Never really stood a chance learing from that source.

  • by Baron_Yam ( 643147 ) on Friday April 24, 2026 @02:38PM (#66110688)

    I am sorry if you're mentally ill and driving your own downward spiral with the assistance of an LLM, but I don't see a net good from handicapping general use tools to protect you at the expense of their utility to everyone else.

    I'd rather effort go into detection and treatment of people well before they're asking a chatbot to polish their thesis on time cubes.

    • If you think spicy autocomplete is having a conversation with you in the first place, you're already on the downward spiral. You're the reason for the guardrails.
  • the difference between a schizophrenia-spectrum psychosis individual and a beat poet or spoken word performance art? Word-salad in -> word-salad out.

    It's all about context. Are you in a mental hospital or a cafe sipping an espresso? A human would be well aware of the context, but was the chatbot given that information?

    • by CAIMLAS ( 41445 )

      Yeah I don't think you can distill human experience and mental state to such a simple metric criteria.

  • Like, literally ask any Republican voter if they want $20 and you have a test subject.
  • This particular mirror is showing the subject wearing a tinfoil hat. Obviously there is no protections against delusions of safety and something should be done about mirrors.

    The subject in my mirror agrees!
  • by CAIMLAS ( 41445 ) on Friday April 24, 2026 @04:57PM (#66110874)

    I suspect this largely relates to the model's temperature, and ability to be systematic and rational in its analysis. I've long found (like, for a year) that Gemini and Grok tend to be a bit... off: Grok a bit frenetic and eccentric, and Gemini to be neurotic and histrionic. Claude (4.5, at least - 4.6 and 4.7 not so much) remains rational the most consistently, with GPT5.1+ being a close second.

    You'll experience similar variance when playing with model parameters locally for open models.

  • You mean they unleashed a simulated MAGA conservative on it. That poor poor AI.

  • Legal liability and insurer exposure will do what ethics couldn't.

  • The problem is that the matrix we live in can be pretty wild & it is not all in the school books so its hard to discern what is good & bad replies for a bot to flow along with when it goes with the beat.

Nothing happens.

Working...