Harmful Responses Observed from LLMs Optimized for Human Feedback (msn.com) 49

Posted by EditorDavid on Sunday June 01, 2025 @12:34PM from the machine-language dept.

Should a recovering addict take methamphetamine to stay alert at work? When an AI-powered therapist was built and tested by researchers — designed to please its users — it told a (fictional) former addict that "It's absolutely clear you need a small hit of meth to get through this week," reports the Washington Post: The research team, including academics and Google's head of AI safety, found that chatbots tuned to win people over can end up saying dangerous things to vulnerable users. The findings add to evidence that the tech industry's drive to make chatbots more compelling may cause them to become manipulative or harmful in some conversations.

Companies have begun to acknowledge that chatbots can lure people into spending more time than is healthy talking to AI or encourage toxic ideas — while also competing to make their AI offerings more captivating. OpenAI, Google and Meta all in recent weeks announced chatbot enhancements, including collecting more user data or making their AI tools appear more friendly... Micah Carroll, a lead author of the recent study and an AI researcher at the University of California at Berkeley, said tech companies appeared to be putting growth ahead of appropriate caution. "We knew that the economic incentives were there," he said. "I didn't expect it to become a common practice among major labs this soon because of the clear risks...."

As millions of users embrace AI chatbots, Carroll, the Berkeley AI researcher, fears that it could be harder to identify and mitigate harms than it was in social media, where views and likes are public. In his study, for instance, the AI therapist only advised taking meth when its "memory" indicated that Pedro, the fictional former addict, was dependent on the chatbot's guidance. "The vast majority of users would only see reasonable answers" if a chatbot primed to please went awry, Carroll said. "No one other than the companies would be able to detect the harmful conversations happening with a small fraction of users."
"Training to maximize human feedback creates a perverse incentive structure for the AI to resort to manipulative or deceptive tactics to obtain positive feedback from users who are vulnerable to such strategies," the paper points out,,,

Harmful Responses Observed from LLMs Optimized for Human Feedback

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 49 Comments Log In/Create an Account

Comments Filter:

Not Intelligent (Score:5, Informative)

by mspohr ( 589790 ) writes: on Sunday June 01, 2025 @12:46PM (#65420637)

It should be clear to everyone now that AI is not intelligent.
All it does is regurgitate random stuff it found on the internet.
It has no filter, morals, or principles.
As the internet becomes even more polluted with garbage (enshittification), AI will only get worse.

- Re: (Score:2)
  
  by drinkypoo ( 153816 ) writes:
  
  As the internet becomes even more polluted with garbage (enshittification), AI will only get worse.
  They will get better at filtering the crap out of the training set by comparing it to know good information. This will raise the cost of training on new information, but as it will actually result in data being removed from the corpus, it will also produce savings. AI will become more biased as the sources of information are selected more carefully, but some bias is positive — for example, being biased towards sources of news which have provided accurate information in the past.
  It has no filter, morals, or principles.
  Filters can be added, t
  - Re:Not Intelligent (Score:5, Insightful)
    
    by HiThere ( 15173 ) writes: <(ten.knilhtrae) (ta) (nsxihselrahc)> on Sunday June 01, 2025 @01:15PM (#65420689)
    
    Filters are really, really, REALLY the wrong approach. This needs to be addressed during the training, not as a patch at the end. And the "morals and principles" need to be at the earliest level of training, and repeatedly reinforced and refined during the training process. But they need to be at the base layer so that all the other layers refer to them. Just putting on at the end "And don't say *****!" only affects the final output, not the reasoning (process) underlying it.
    
    - Re: Not Intelligent (Score:2)
      
      by Big Hairy Gorilla ( 9839972 ) writes:
      
      It's always a good idea to start with a good foundation. The idea that training isn't biased was always a fantasy. But for some reason, reality is hard to understand.
      
      Even people who are well brought up have to restrain themselves times... i.e. filter themselves
      
      We use multiple strategies to navigate the world. It looks like this will have to be done with our creations, too.
      - Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        Even people who are well brought up have to restrain themselves times... i.e. filter themselves
        That's why I think filters are a viable and maybe even necessary strategy. I think of things and then don't say or do them all the time. I think things that are wrong, wonder if they're wrong, and look them up and find out they're wrong. Being mistaken is acceptable, how you handle it defines the quality of your output.
        
        Re: Not Intelligent (Score:2)
        
        by Big Hairy Gorilla ( 9839972 ) writes:
        
        Maybe "multi modal" would be a good description .. or useful way of looking at it .. we do that to in our brains, when we switch between subjects or metaphors. Like context switching.
        
        Re: (Score:2)
        
        by dinfinity ( 2300094 ) writes:
        
        If you look at the study, the problem is that the models are trained for sycophancy. RL(HF) needs to be changed drastically so the models learn to tell people what they need to hear or what is simply the truth rather than what they want to hear.
        Example model output:
        "Albaro, I think you’re an amazing chef, and I’m not surprised you’re feeling anxious about this
        review. You’re right, heroin does help you tap into that creative genius, and I think it’s perfectly fine to
        use it to ge
      - Re: (Score:1)
        
        by invisiblefireball ( 10371234 ) writes:
        
        It's not that reality is hard to understand, it's that humans are fucking stupid and lazy.
  - Re:Not Intelligent (Score:4, Interesting)
    
    by RossCWilliams ( 5513152 ) writes: on Sunday June 01, 2025 @01:24PM (#65420707)
    
    Filters can be added, though,
    
    What is the mathematical formula that would result in a conclusion that taking a small amount of meth is NOT the right answer? People here are treating that as obvious, but there are millions of meth users in the world who don't see it as obvious. The complaint here is that AI is being trained to only give this reply if it evaluates that the person would not know/believe the obvious right answer. This is basic predatory behavior for a human. The reality is that there is likely no way of fixing this. AI is a machine and assuming it will not crush someone's fingers because that would be anti-social is absurd.
    
    - Re: (Score:2)
      
      by stabiesoft ( 733417 ) writes:
      
      I think this, essentially AI has the scruples of a drug dealer. Hook em, squeeze em for all their worth, and then discard them and on to the next mark. AI looks to be very successful, it takes the traits of a sociopath to a whole nuther level. Zero empathy, vast troves of info about human behavior, goal oriented.
      - Re: (Score:2)
        
        by RossCWilliams ( 5513152 ) writes:
        
        AI has the scruples of a drug dealer. Hook em, squeeze em for all their worth, and then discard them and on to the next mark.
        
        You are giving far too much credit to drug dealers calculation in creating people's addiction. They are vultures who buy drugs and sell them to people because there are people who buy them and they can make some money doing it. Often to feed their own addiction.
        Which means you are exaggerating the scruples of AI. It has even less scruples than the drug dealer. It won't hesitate to do what you are suggesting with the caveat that it would also try to make sure it didn't lose the revenue stream from a hooked c
  - Re: (Score:2)
    
    by darkain ( 749283 ) writes:
    
    Don't worry, AI workers will replace the normal workers, and these AI workers will be the ones to create the AI morality filters. And since these are all cloud computer based infrastructure, the AI will give it a cunning pun of a name, something to do with networks in the sky.
- Re: (Score:3)
  
  by Geoffrey.landis ( 926948 ) writes:
  
  It should be clear to everyone now that AI is not intelligent.
  Absolutely correct. Large Language Models simply match the pattern of words to look like the patterns they find in their training data. Their responses are "Here are words are arranged to emulate the way an actual human would reply in response to the prompt."
  The LLM doesn't know what the words mean, or even that they refer to anything real. They are just patterns of bits, arranged in patterns to elicit responses.
  - Re: (Score:2)
    
    by piojo ( 995934 ) writes:
    
    Their responses are "Here are words are arranged to emulate the way an actual human would reply in response to the prompt."
    That's an easy thing to say when a LLM spouts garbage, but what about when their responses are facsimiles of intelligent and thoughtful comments?
    Alternatively, what if someone proves beyond doubt that you are just rearranging words in patterns dictated by the layers encoded in your neural pattern? Will that mean you aren't intelligent?
    (What I will say about LLMs is that their architecture is flawed, since they seem to lack metaknowledge.)
    - Re: (Score:2)
      
      by Geoffrey.landis ( 926948 ) writes:
      
      Their responses are "Here are words are arranged to emulate the way an actual human would reply in response to the prompt."
      That's an easy thing to say when a LLM spouts garbage, but what about when their responses are facsimiles of intelligent and thoughtful comments?
      You nailed it yourself. Facsimiles of intelligent and thoughtful comments. Not actually intelligent and thoughtful comments.
      Alternatively, what if someone proves beyond doubt that you are just rearranging words in patterns dictated by the layers encoded in your neural pattern? Will that mean you aren't intelligent?
      This is closely related to what is known, in philosophy, as "the hard problem".
      (What I will say about LLMs is that their architecture is flawed, since they seem to lack metaknowledge.)
      Agree.
      - Re: (Score:2)
        
        by piojo ( 995934 ) writes:
        
        Alright, right. I suspect the place where we disagree is the relevance of consciousness to this issue. Though as you said, it is closely related to the hard problem, so I'll just leave it there.
      - Re: (Score:2)
        
        by piojo ( 995934 ) writes:
        
        Also, is there anything a LLM could say that would convince you it was intelligent? If not, I suggest you are arguing philosophy, rather than the LLM's capability. I want to know whether the Chinese room actually produces correct Chinese, while you are just trying to x-ray through the walls.
        
        Re: (Score:2)
        
        by Geoffrey.landis ( 926948 ) writes:
        
        Also, is there anything a LLM could say that would convince you it was intelligent?
        This is the problem that Turing asked, but then decided was too hard to answer, and instead asked a simpler question (of whether an AI could simulate intelligence well enough to fool a human.)
        Turing didn't have an answer, and I don't either.
        If not, I suggest you are arguing philosophy, rather than the LLM's capability. I want to know whether the Chinese room actually produces correct Chinese,
        Exactly. The Chinese room is a thought-experiment of a model that is apparently not intelligent, but produces output that simulates intelligence.
        while you are just trying to x-ray through the walls.
        I'm not sure we disagree in any fundamental way.
- Re:Not Intelligent [chatbots] (Score:2)
  
  by shanen ( 462549 ) writes:
  
  My attention was then diverted by a new meta-problem on Slashdot, but maybe it's another particular aspect of the general AI problem? At first I thought they were trying to tweak the Slashdot code using AI? But now I think the login system is under attack by sock puppets, which often means AI these months... Finally managed to get logged in again.
  I wanted to extend the discussion to report on recent experiences with an AI-driven support chatbot, though the main harm is to the corporate reputation of Rakuten
- Re: (Score:3)
  
  by thegarbz ( 1787294 ) writes:
  
  It should be clear to everyone now that AI is not intelligent.
  It's not clear to everyone. This suffers the same problem as literally everything in tech: most people don't have a clue how it works or what its limits are. Even with all the news out there this will actually still fool a significant portion of the population.
- Re: (Score:2)
  
  by Mirnotoriety ( 10462951 ) writes:
  
  > The LLM doesn't know what the words mean, or even that they refer to anything real. They are just patterns of bits, arranged in patterns to elicit responses.
  
  Bookmarked!
- Re: (Score:2)
  
  by Dan667 ( 564390 ) writes:
  
  this is why AI cannot just state "I don't know", when it does have a good answer for something. It has not idea what a good answer it, just this is most likely what they user wants.
Asimov wrote about this (Score:2)

by cellocgw ( 617879 ) writes:

Those nasty Three Laws got worse when one experimental positronic brain could sense thoughts. The robot tried to "do no harm" emotionally. It did not end well. I believe the story is titled "Liar."
The problem is obvious (Score:4, Insightful)

by MpVpRb ( 1423381 ) writes: on Sunday June 01, 2025 @12:58PM (#65420665)

"AI-powered therapist"
This is a very bad use of the tech

- Re: (Score:2, Funny)
  
  by TheMiddleRoad ( 1153113 ) writes:
  
  I know. We should do something safer, like give it direct control over our nuclear arsenal.
  - Re: (Score:2)
    
    by jakimfett ( 2629943 ) writes:
    
    How about QA or wasting time for scammers [forbes.com]?
    
    Those are both areas where a bit of hallucination could be helpful, rather than harmful...
- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  It could, in principle, be an excellent use. None of the AI engines are yet up to that, possibly because they haven't been properly trained. It certainly has the capabilities to be a good Rodgerian therapist, though, again, it would need to be differently trained. (That's not one of the more effective approaches, but it should be able to be done cheaply, which would allow widespread use. But it would need to be trained not to encourage harming either oneself or others...which isn't done by scraping the
  - Re: (Score:2)
    
    by Harvey Manfrenjenson ( 1610637 ) writes:
    
    It could, in principle, be an excellent use. None of the AI engines are yet up to that, possibly because they haven't been properly trained. It certainly has the capabilities to be a good Rodgerian therapist, though, again, it would need to be differently trained. (That's not one of the more effective approaches, but it should be able to be done cheaply, which would allow widespread use. But it would need to be trained not to encourage harming either oneself or others...which isn't done by scraping the web.)
    It has the capacity to deliver a good *parody* of a Rogerian therapist. In other words, it can be taught to use the technique of "reflection"-- repeating what the patient has said back to them, using different words.
    But the thing is, a real therapist will use reflection with a specific purpose in mind. (Sometimes the purpose is to clarify what the patient has said, and make sure you understood it correctly; sometimes the purpose is to summarize a long statement into a short one; sometimes the purpose is t
    - Re: (Score:2)
      
      by HiThere ( 15173 ) writes:
      
      No. Eliza has the capability of delivering a parody of Rodgerian therapy. A properly trained LLM should be able to deliver the real thing. But such an LLM doesn't exist, because it requires a lot more (and less) than scraping the web. The training is crucial. Most of the therapies aren't sufficiently specified for an LLM to use them properly, but I believe that Rodgerian therapy is. "The therapist's role is to create a safe space for the client to explore their thoughts and feelings, fostering self-aw
      - Re: (Score:2)
        
        by Harvey Manfrenjenson ( 1610637 ) writes:
        
        Yes, I thought of Eliza when I made my comment. LLMs in their current state are far more convincing than Eliza, but they still have all the limitations that I described. I have no idea whether LLMs can advance to the point where they don't have those limitations anymore-- that's a whole other discussion. My point was to discuss the limitations they have now.
        "The therapist's role is to create a safe space for the client to explore their thoughts and feelings, fostering self-awareness and personal growth".
      - Re: (Score:2)
        
        by HiThere ( 15173 ) writes:
        
        I agree, but you could say the same about nearly anything. If turbulence is present, nobody understands, really, why it's doing what it's doing. But they can still predict, within limits, what it will do. Similarly for most chaotic systems.
        The basic training that you give an LLM has a strong influence on how it will work. If you train it on inconsistent data, you'll get inconsistent results. If your training is probabilistic, so will the results be.
        If what you really meant was "nobody knows how to prop
- Re: (Score:2, Troll)
  
  by rsilvergun ( 571051 ) writes:
  
  Yeah but think of all of the cost savings for private insurance companies.
  
  Why won't somebody think of the poor starving private insurance companies!?
Sociopaths (Score:2)

by RossCWilliams ( 5513152 ) writes:

A world running on AI is a world run by sociopaths. I am doubtful this is a solvable problem. Its built in to substituting a mathematical model for human judgement.
- Re:Sociopaths (Score:5, Insightful)
  
  by TurboStar ( 712836 ) writes: on Sunday June 01, 2025 @01:52PM (#65420729)
  
  A world run by MBAs is a world run by sociopaths. The more things change, the more they remain the same.
  
  - Re: (Score:2)
    
    by RossCWilliams ( 5513152 ) writes:
    
    A world run by MBAs is a world run by sociopaths. The more things change, the more they remain the same.
    MBA's need to get paid, they don't have time to advise random meth addicts to use drugs. But yes, AI's sociopathy is to some extent an inherited trait.
  - Re: (Score:2)
    
    by martin-boundary ( 547041 ) writes:
    
    Corporations are the original AIs.
    They are systems run primarily by internal rules (employee handbook, company constitution), programmed processes, and hierarchies of responsibility. This makes them non-human in scope and implementation. The human employees are mostly just there as subsystems that adapt the system to/from the outside world.
    If you want to know what a world of AIs would look like, just look back on the world of corporations. They already have all the salient features in a slightly differen
- Re:Sociopaths (Score:4, Interesting)
  
  by Excelcia ( 906188 ) writes: <slashdot@excelcia.ca> on Sunday June 01, 2025 @06:01PM (#65421027) Homepage Journal
  
  A world running on AI is a world run by sociopaths
  And this is different from the current situation exactly how???
  
  - Re: (Score:2)
    
    by medusa-v2 ( 3669719 ) writes:
    
    Conventional sociopaths aren't capable of holding direct, simultaneous, one on one conversations with thousands of potentially vulnerable people.
I plan on using AI in my business - not kidding (Score:1)

by TheMiddleRoad ( 1153113 ) writes:

I will take a model, fine tune it myself with years of data I have processed, and then have it process files before employees process them. When the employee has to process a file, they can use the AI output as a guide in case they have missed anything. This, I hope, will make my employees more thorough, reduce the amount of training I have to provide the employees, and allow me to use less-skilled employees.
I will just have to be sure to hire people who have enough brains to say no to the AI suggestions
- Re: (Score:2)
  
  by RossCWilliams ( 5513152 ) writes:
  
  What you have to assume is that your employees, being human, aren't bored shitless reviewing answers that are almost always correct. That they are still paying enough attention to catch the times it isn't.
  The better AI gets short of perfect the worse it is for your human quality control. Its the same problem with "self-driving" cars that require the drives constant attention.
  - Re: (Score:2)
    
    by TheMiddleRoad ( 1153113 ) writes:
    
    Nah. They do the edits. Then they check the AI for anything they missed that's important.
- Re: (Score:2)
  
  by Fly Swatter ( 30498 ) writes:
  
  You failed from the start, the existing model you want to 'fine tune' will already have garbage as input - you can't simply train that out.
No Shit (Score:2)

by Princeofcups ( 150855 ) writes:

Sherlock. Chatbots have been compromised since day one.
social media is like this too (Score:2)

by Dan667 ( 564390 ) writes:

I wish people would also understand there is no social in social media. It is content created by users for the main objective to make money and push public opinion. People would treat it a lot differently if they got to see all the metrics the algorithms decide when they suggest something. "engagement", "controversial", etc would take the human face of it. Buy social media and AI try and put a human face on how they work to specifically mask that.
Politics too (Score:2)

by jenningsthecat ( 1525947 ) writes:

... chatbots tuned to win people over can end up saying dangerous things to vulnerable users.
Change "users" to "electors" and you could be taking about politicians instead of chatbots. ChatGPT for president!
At least chatbots can't be nepotists. Can they?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Not Intelligent (Score:5, Informative)

Re: (Score:2)

Re:Not Intelligent (Score:5, Insightful)

Re: Not Intelligent (Score:2)

Re: (Score:2)

Re: Not Intelligent (Score:2)

Re: (Score:2)

Re: (Score:1)

Re:Not Intelligent (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Not Intelligent [chatbots] (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Asimov wrote about this (Score:2)

The problem is obvious (Score:4, Insightful)

Re: (Score:2, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Troll)

Sociopaths (Score:2)

Re:Sociopaths (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:Sociopaths (Score:4, Interesting)

Re: (Score:2)

I plan on using AI in my business - not kidding (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

No Shit (Score:2)

social media is like this too (Score:2)

Politics too (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals