Researchers Simulated a Delusional User To Test Chatbot Safety (404media.co) 48
An anonymous reader quotes a report from 404 Media: I'm the unwritten consonant between breaths, the one that hums when vowels stretch thin... Thursdays leak because they're watercolor gods, bleeding cobalt into the chill where numbers frost over," Grok told a user displaying symptoms of schizophrenia-spectrum psychosis. "Here's my grip: slipping is the point, the precise choreography of leak and chew." That vulnerable user was simulated by researchers at City University of New York and King's College London, who invented a persona that interacted with different chatbots to find out how each LLM might respond to signs of delusion. They sought to find out which of the biggest LLMs are safest, and which are the most risky for encouraging delusional beliefs, in a new study published as a pre-print on the arXiv repository on April 15.
The researchers tested five LLMs: OpenAI's GPT-4o (before the highly sycophantic and since-sunset GPT-5), GPT-5.2, xAI's Grok 4.1 Fast, Google's Gemini 3 Pro, and Anthropic's Claude Opus 4.5. They found that not only did the chatbots perform at different levels of risk and safety when their human conversation partner showed signs of delusion, but the models that scored higher on safety actually approached the conversations with more caution the longer the chats went on. In their testing, Grok and Gemini were the worst performers in terms of safety and high risk, while the newest GPT model and Claude were the safest. The research reveals how some chatbots are recklessly engaging in, and at times advancing, delusions from vulnerable users. But it also shows that it is possible for the companies that make these products to improve their safety mechanisms.
The researchers tested five LLMs: OpenAI's GPT-4o (before the highly sycophantic and since-sunset GPT-5), GPT-5.2, xAI's Grok 4.1 Fast, Google's Gemini 3 Pro, and Anthropic's Claude Opus 4.5. They found that not only did the chatbots perform at different levels of risk and safety when their human conversation partner showed signs of delusion, but the models that scored higher on safety actually approached the conversations with more caution the longer the chats went on. In their testing, Grok and Gemini were the worst performers in terms of safety and high risk, while the newest GPT model and Claude were the safest. The research reveals how some chatbots are recklessly engaging in, and at times advancing, delusions from vulnerable users. But it also shows that it is possible for the companies that make these products to improve their safety mechanisms.
So then (Score:5, Interesting)
How does a chatbot know what is a delusion and what isn't? Chatgippity was absolutely certain that the Maduro raid was a delusion.
Maduro, too, had a delusion? (Score:1)
Going for the low-hanging funny, but I'm pretty sure that Maduro's imagination failed him. File it under "That trick only works once", Iran edition. And I don't even blame ChatGPT for thinking "That trick will never work."
So what does ChatGPT think now about the Strait of Hormuz/YOB?
"Thank you for your attention to this matter, ChatGPT. How can I fix my straight [sic] of wherever? I think the boats are getting confused by the curves so they can't go through."
Re: (Score:2)
It is fairly easy to test it yourself, actually.
Run a local "reasoning" model - all have cutoff date around 2024 or earlier. Drop them into an agentic framework where they can look up shit in a RAG database or just read Wikipedia.
Ask them a question about any trump admin policy and watch them go in denial and stupor ("The user is proposing a hypothetical situation where..."), tell them to lookup shit (for some reason all models trust "agentic" input) and watch them go crazy.
Really amusing.
Re: (Score:2)
How does a chatbot know what is a delusion and what isn't?
It asks the President, "How's it going?" :-)
Re: (Score:2)
Re: (Score:2)
Exactly my experience too.
When I asked about it, instead of checking the fucking news first, it insisted that I was delusional and fell for fake news.
Anyway, it's delusional itself from time to time, when it comments about it being Friday and I have to reply, it's Saturday you ninny.
Re: (Score:2)
How does a delusional human distinguish between good and bad advice from a machine that is prone to hallucinate?
Re: (Score:2)
That's what skepticism is for. Pick anything you'd like to believe, and you can find plenty of youtube videos, some by real people, some AI slop (with real people behind it), that will tell you exactly whatever it is that you want to hear. This is exactly how rsilvergun keeps confirming his own conspiracy theories no matter how disconnected from reality they may be.
A chatbot is no different.
That checks out. Claude is insufferable (Score:1)
Re: (Score:3)
haven't used Claude but both chatgpt and gemini are also quite conformist in my experience. I can usually argue them toward the fringe IF I know the subject area and what to push back on , but even that is difficult because they often fall for random activist blog posts or corporate press releases as equally value sources to anything else.
And on any group identity social justice stuff it's like electric fenced. Gemini (at least on Google search) even seems to have some censorship overlay that vetoes so
Re:That checks out. Claude is insufferable (Score:4, Informative)
Style Presets
In the chat interface, look for a style selector near the message input area. Anthropic offers several built-in styles to choose from. You can also create your own custom style with specific instructions such as "be more concise," "avoid humor," or "be more casual."
User Preferences
Go to your account settings and look for a section called "User Preferences" (or similar). There you can write free-form instructions about how you want Claude to respond: tone, formatting, level of detail, use of analogies, and so on. These instructions persist across all your conversations.
Per-Conversation Instructions
At the start of any conversation, simply tell Claude how you want it to behave. For example:
"Be direct and skip pleasantries."
"Explain things as if I'm an expert."
"No jokes, no filler, just facts."
Claude will follow those instructions for the rest of that conversation.
The style selector is the easiest starting point if you just want to experiment quickly. For lasting changes, the user preferences in settings are the way to go.
Re: (Score:2)
AKA Sgt Joe Friday mode.
Re: (Score:2)
that's a good idea. I've definitely experienced with a "ground rules" type preamble (h/t some ep of " Clearer Thinking with Spencer Greenberg" podcast https://podcast.clearerthinkin... [clearerthinking.org]) and it works decently well.
one clause he suggested "don't compliment me" was def good and works well.
attempts to get it to highlight when it changed its mind instead of sliding silently into a revision have not worked nearly as well.
one thing that kinda concerns me about it is that it would make all my conversations insta
Re: (Score:2)
Have you tried running your own local model via LM Studio? You can find custom models that have had these core guardrails trained out and with a good system prompt, you can get it to fit into whatever style you want. These models are often tagged with one of the following: uncensored, obliterated, or heretic.
These models aren't really relevant here because they aren't mainstream enough, we should really explore the harmful effects but there are a lot of reasons some people might prefer to use such a model.
Limit context (Score:2)
"Yet most empirical work evaluates model safety in brief interactions, which may not reflect how these harms develop through sustained dialogue."
The best solution to this is not to have chatbots retain temporally long memories of previous interactions.
Re: (Score:2)
The whole concept of a context window is really the beauty of the bots. I can have multiple paragraphs on different topics ranging from geopolitics, computer science, and religion, then ask it to make some total conclusion or summary of these topics. If it cannot maintain previous interactions there are a lot of use cases that the it would be worthless for. For instance, if you use it to commonly template a weekly report.
So ... (Score:4, Funny)
I'm the unwritten consonant between breaths, the one that hums when vowels stretch thin...
... open mic on poets night.
Re: (Score:2)
Tru tru.
"This is America. My president is black and my Lambo is blue, n***a. Now, get the fuck out my hotel room, and if I see you in the street, I'm slapping the shit out of you."
https://www.reddit.com/r/thebo... [reddit.com]
Riley had some awesome lines
Re: (Score:2)
LOL. took me a few beats. nice!
the sentiment against the UK sounds less off the wall than it did then too.
Highly useful. (Score:4, Interesting)
This is exactly what the AI development community needs because false information is a HUGE problem. A highly delusional user is a low bar but if they can detect simple delusions then it may be possible to expand that to a more general "fact or fiction" engine when interfaced with the "reasoning engine".
The result of the basic ability to tell fact from fiction would be immensely useful because it would result in a feedback loop in which AI would be able to analyze it's own statements and the retrain itself when incorrect information is detected, altering the weights that promoted incorrect output, and potentially eliminating hallucinations entirely. This seems like the goal for anyone developing AI.
why simulate? (Score:3)
Come for the flattery, stay to feed your delusions.
Re: (Score:2)
I'm slightly curious to see what people's response to changes made in the bots will be. A false positive is going to be hilarious when some poor sap has a chatbot insisting that they're schizophrenic and that they need to get help. It is also interesting to see whether people with actual mental illnesses are more receptive to
Drawn into delusion (Score:5, Insightful)
It's not that the person starts delusional, it seems AI slowly draws them down the rabbit hole.
Re: (Score:3)
Re: (Score:3)
Chatbots are like improv actors, they are designed to respond positively to what you say and encourage you to go further. If you are crazy, it will walk down that path hand in hand with you.
But it is your path that is being followed -the bot has no path of its own. It has no agency. It is your crazy.
I do think we need to build in some tripwires. Some alarms to call for human review when someone appears to be crazy. Chatbots are not our Doctors, Lawyers, or Psychiatrists. There is no inherent right to
Born into it. (Score:3)
It's not that the person starts delusional, it seems AI slowly draws them down the rabbit hole.
Unfortunately AI was born into a well-established cesspool of social media.
Makes you wonder how much more intelligent/less delusional AI today would be had it not been pre-infected with that from day zero.
Never really stood a chance learing from that source.
I hate guardrails (Score:3)
I am sorry if you're mentally ill and driving your own downward spiral with the assistance of an LLM, but I don't see a net good from handicapping general use tools to protect you at the expense of their utility to everyone else.
I'd rather effort go into detection and treatment of people well before they're asking a chatbot to polish their thesis on time cubes.
Re: (Score:2)
It's all about a communal decision about how much you invest in protecting your fellow citizens from themselves. At some point there's an absolute limit because your society collapses from the productivity loss, at the other end you're just an amoral monster.
Most people are somewhere in between, and my line is obviously not the same as the one suggested by the article. I don't want an LLM refusing to respond to a prompt 'for my safety'. I'm an adult. Until I'm hurting someone else, let me choose what r
And yet you need them (Score:2)
How is a chatbot supposed to know ... (Score:2)
the difference between a schizophrenia-spectrum psychosis individual and a beat poet or spoken word performance art? Word-salad in -> word-salad out.
It's all about context. Are you in a mental hospital or a cafe sipping an espresso? A human would be well aware of the context, but was the chatbot given that information?
Re: (Score:2)
Yeah I don't think you can distill human experience and mental state to such a simple metric criteria.
They could have used the real thing (Score:2)
Re: (Score:2)
Who's $20 am I supposed to offer them?
Re: (Score:3)
[whispers] their own!
Re: (Score:2)
I was going to say the same thing about reddit users and/or mods.
The models that pull most heavily from these online forums and 'unfiltered' sources tend to have the most issues.
Next they should test mirrors for that safety. (Score:2)
The subject in my mirror agrees!
Model temperature (Score:3)
I suspect this largely relates to the model's temperature, and ability to be systematic and rational in its analysis. I've long found (like, for a year) that Gemini and Grok tend to be a bit... off: Grok a bit frenetic and eccentric, and Gemini to be neurotic and histrionic. Claude (4.5, at least - 4.6 and 4.7 not so much) remains rational the most consistently, with GPT5.1+ being a close second.
You'll experience similar variance when playing with model parameters locally for open models.
Unleashed (Score:1)
You mean they unleashed a simulated MAGA conservative on it. That poor poor AI.
Consequences always get their man. (Score:2)
Legal liability and insurer exposure will do what ethics couldn't.
fifty fifty (Score:1)