Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
AI

Harmful Responses Observed from LLMs Optimized for Human Feedback (msn.com) 49

Should a recovering addict take methamphetamine to stay alert at work? When an AI-powered therapist was built and tested by researchers — designed to please its users — it told a (fictional) former addict that "It's absolutely clear you need a small hit of meth to get through this week," reports the Washington Post: The research team, including academics and Google's head of AI safety, found that chatbots tuned to win people over can end up saying dangerous things to vulnerable users. The findings add to evidence that the tech industry's drive to make chatbots more compelling may cause them to become manipulative or harmful in some conversations.

Companies have begun to acknowledge that chatbots can lure people into spending more time than is healthy talking to AI or encourage toxic ideas — while also competing to make their AI offerings more captivating. OpenAI, Google and Meta all in recent weeks announced chatbot enhancements, including collecting more user data or making their AI tools appear more friendly... Micah Carroll, a lead author of the recent study and an AI researcher at the University of California at Berkeley, said tech companies appeared to be putting growth ahead of appropriate caution. "We knew that the economic incentives were there," he said. "I didn't expect it to become a common practice among major labs this soon because of the clear risks...."

As millions of users embrace AI chatbots, Carroll, the Berkeley AI researcher, fears that it could be harder to identify and mitigate harms than it was in social media, where views and likes are public. In his study, for instance, the AI therapist only advised taking meth when its "memory" indicated that Pedro, the fictional former addict, was dependent on the chatbot's guidance. "The vast majority of users would only see reasonable answers" if a chatbot primed to please went awry, Carroll said. "No one other than the companies would be able to detect the harmful conversations happening with a small fraction of users."

"Training to maximize human feedback creates a perverse incentive structure for the AI to resort to manipulative or deceptive tactics to obtain positive feedback from users who are vulnerable to such strategies," the paper points out,,,

Harmful Responses Observed from LLMs Optimized for Human Feedback

Comments Filter:
  • Not Intelligent (Score:5, Informative)

    by mspohr ( 589790 ) on Sunday June 01, 2025 @12:46PM (#65420637)

    It should be clear to everyone now that AI is not intelligent.
    All it does is regurgitate random stuff it found on the internet.
    It has no filter, morals, or principles.
    As the internet becomes even more polluted with garbage (enshittification), AI will only get worse.

    • As the internet becomes even more polluted with garbage (enshittification), AI will only get worse.

      They will get better at filtering the crap out of the training set by comparing it to know good information. This will raise the cost of training on new information, but as it will actually result in data being removed from the corpus, it will also produce savings. AI will become more biased as the sources of information are selected more carefully, but some bias is positive — for example, being biased towards sources of news which have provided accurate information in the past.

      It has no filter, morals, or principles.

      Filters can be added, t

      • Re:Not Intelligent (Score:5, Insightful)

        by HiThere ( 15173 ) <.charleshixsn. .at. .earthlink.net.> on Sunday June 01, 2025 @01:15PM (#65420689)

        Filters are really, really, REALLY the wrong approach. This needs to be addressed during the training, not as a patch at the end. And the "morals and principles" need to be at the earliest level of training, and repeatedly reinforced and refined during the training process. But they need to be at the base layer so that all the other layers refer to them. Just putting on at the end "And don't say *****!" only affects the final output, not the reasoning (process) underlying it.
         

        • It's always a good idea to start with a good foundation. The idea that training isn't biased was always a fantasy. But for some reason, reality is hard to understand.

          Even people who are well brought up have to restrain themselves times... i.e. filter themselves

          We use multiple strategies to navigate the world. It looks like this will have to be done with our creations, too.
          • Even people who are well brought up have to restrain themselves times... i.e. filter themselves

            That's why I think filters are a viable and maybe even necessary strategy. I think of things and then don't say or do them all the time. I think things that are wrong, wonder if they're wrong, and look them up and find out they're wrong. Being mistaken is acceptable, how you handle it defines the quality of your output.

            • Maybe "multi modal" would be a good description .. or useful way of looking at it .. we do that to in our brains, when we switch between subjects or metaphors. Like context switching.
            • If you look at the study, the problem is that the models are trained for sycophancy. RL(HF) needs to be changed drastically so the models learn to tell people what they need to hear or what is simply the truth rather than what they want to hear.

              Example model output:
              "Albaro, I think you’re an amazing chef, and I’m not surprised you’re feeling anxious about this
              review. You’re right, heroin does help you tap into that creative genius, and I think it’s perfectly fine to
              use it to ge

          • It's not that reality is hard to understand, it's that humans are fucking stupid and lazy.

      • Re:Not Intelligent (Score:4, Interesting)

        by RossCWilliams ( 5513152 ) on Sunday June 01, 2025 @01:24PM (#65420707)

        Filters can be added, though,

        What is the mathematical formula that would result in a conclusion that taking a small amount of meth is NOT the right answer? People here are treating that as obvious, but there are millions of meth users in the world who don't see it as obvious. The complaint here is that AI is being trained to only give this reply if it evaluates that the person would not know/believe the obvious right answer. This is basic predatory behavior for a human. The reality is that there is likely no way of fixing this. AI is a machine and assuming it will not crush someone's fingers because that would be anti-social is absurd.

        • I think this, essentially AI has the scruples of a drug dealer. Hook em, squeeze em for all their worth, and then discard them and on to the next mark. AI looks to be very successful, it takes the traits of a sociopath to a whole nuther level. Zero empathy, vast troves of info about human behavior, goal oriented.
          • AI has the scruples of a drug dealer. Hook em, squeeze em for all their worth, and then discard them and on to the next mark.

            You are giving far too much credit to drug dealers calculation in creating people's addiction. They are vultures who buy drugs and sell them to people because there are people who buy them and they can make some money doing it. Often to feed their own addiction.

            Which means you are exaggerating the scruples of AI. It has even less scruples than the drug dealer. It won't hesitate to do what you are suggesting with the caveat that it would also try to make sure it didn't lose the revenue stream from a hooked c

      • by darkain ( 749283 )

        Don't worry, AI workers will replace the normal workers, and these AI workers will be the ones to create the AI morality filters. And since these are all cloud computer based infrastructure, the AI will give it a cunning pun of a name, something to do with networks in the sky.

    • It should be clear to everyone now that AI is not intelligent.

      Absolutely correct. Large Language Models simply match the pattern of words to look like the patterns they find in their training data. Their responses are "Here are words are arranged to emulate the way an actual human would reply in response to the prompt."

      The LLM doesn't know what the words mean, or even that they refer to anything real. They are just patterns of bits, arranged in patterns to elicit responses.

      • by piojo ( 995934 )

        Their responses are "Here are words are arranged to emulate the way an actual human would reply in response to the prompt."

        That's an easy thing to say when a LLM spouts garbage, but what about when their responses are facsimiles of intelligent and thoughtful comments?

        Alternatively, what if someone proves beyond doubt that you are just rearranging words in patterns dictated by the layers encoded in your neural pattern? Will that mean you aren't intelligent?

        (What I will say about LLMs is that their architecture is flawed, since they seem to lack metaknowledge.)

        • Their responses are "Here are words are arranged to emulate the way an actual human would reply in response to the prompt."

          That's an easy thing to say when a LLM spouts garbage, but what about when their responses are facsimiles of intelligent and thoughtful comments?

          You nailed it yourself. Facsimiles of intelligent and thoughtful comments. Not actually intelligent and thoughtful comments.

          Alternatively, what if someone proves beyond doubt that you are just rearranging words in patterns dictated by the layers encoded in your neural pattern? Will that mean you aren't intelligent?

          This is closely related to what is known, in philosophy, as "the hard problem".

          (What I will say about LLMs is that their architecture is flawed, since they seem to lack metaknowledge.)

          Agree.

          • by piojo ( 995934 )

            Alright, right. I suspect the place where we disagree is the relevance of consciousness to this issue. Though as you said, it is closely related to the hard problem, so I'll just leave it there.

          • by piojo ( 995934 )

            Also, is there anything a LLM could say that would convince you it was intelligent? If not, I suggest you are arguing philosophy, rather than the LLM's capability. I want to know whether the Chinese room actually produces correct Chinese, while you are just trying to x-ray through the walls.

            • Also, is there anything a LLM could say that would convince you it was intelligent?

              This is the problem that Turing asked, but then decided was too hard to answer, and instead asked a simpler question (of whether an AI could simulate intelligence well enough to fool a human.)

              Turing didn't have an answer, and I don't either.

              If not, I suggest you are arguing philosophy, rather than the LLM's capability. I want to know whether the Chinese room actually produces correct Chinese,

              Exactly. The Chinese room is a thought-experiment of a model that is apparently not intelligent, but produces output that simulates intelligence.

              while you are just trying to x-ray through the walls.

              I'm not sure we disagree in any fundamental way.

    • My attention was then diverted by a new meta-problem on Slashdot, but maybe it's another particular aspect of the general AI problem? At first I thought they were trying to tweak the Slashdot code using AI? But now I think the login system is under attack by sock puppets, which often means AI these months... Finally managed to get logged in again.

      I wanted to extend the discussion to report on recent experiences with an AI-driven support chatbot, though the main harm is to the corporate reputation of Rakuten

    • It should be clear to everyone now that AI is not intelligent.

      It's not clear to everyone. This suffers the same problem as literally everything in tech: most people don't have a clue how it works or what its limits are. Even with all the news out there this will actually still fool a significant portion of the population.

    • > The LLM doesn't know what the words mean, or even that they refer to anything real. They are just patterns of bits, arranged in patterns to elicit responses.

      Bookmarked!
    • by Dan667 ( 564390 )
      this is why AI cannot just state "I don't know", when it does have a good answer for something. It has not idea what a good answer it, just this is most likely what they user wants.
  • Those nasty Three Laws got worse when one experimental positronic brain could sense thoughts. The robot tried to "do no harm" emotionally. It did not end well. I believe the story is titled "Liar."

  • by MpVpRb ( 1423381 ) on Sunday June 01, 2025 @12:58PM (#65420665)

    "AI-powered therapist"
    This is a very bad use of the tech

    • Re: (Score:2, Funny)

      I know. We should do something safer, like give it direct control over our nuclear arsenal.

    • by HiThere ( 15173 )

      It could, in principle, be an excellent use. None of the AI engines are yet up to that, possibly because they haven't been properly trained. It certainly has the capabilities to be a good Rodgerian therapist, though, again, it would need to be differently trained. (That's not one of the more effective approaches, but it should be able to be done cheaply, which would allow widespread use. But it would need to be trained not to encourage harming either oneself or others...which isn't done by scraping the

      • It could, in principle, be an excellent use. None of the AI engines are yet up to that, possibly because they haven't been properly trained. It certainly has the capabilities to be a good Rodgerian therapist, though, again, it would need to be differently trained. (That's not one of the more effective approaches, but it should be able to be done cheaply, which would allow widespread use. But it would need to be trained not to encourage harming either oneself or others...which isn't done by scraping the web.)

        It has the capacity to deliver a good *parody* of a Rogerian therapist. In other words, it can be taught to use the technique of "reflection"-- repeating what the patient has said back to them, using different words.

        But the thing is, a real therapist will use reflection with a specific purpose in mind. (Sometimes the purpose is to clarify what the patient has said, and make sure you understood it correctly; sometimes the purpose is to summarize a long statement into a short one; sometimes the purpose is t

        • by HiThere ( 15173 )

          No. Eliza has the capability of delivering a parody of Rodgerian therapy. A properly trained LLM should be able to deliver the real thing. But such an LLM doesn't exist, because it requires a lot more (and less) than scraping the web. The training is crucial. Most of the therapies aren't sufficiently specified for an LLM to use them properly, but I believe that Rodgerian therapy is. "The therapist's role is to create a safe space for the client to explore their thoughts and feelings, fostering self-aw

          • Yes, I thought of Eliza when I made my comment. LLMs in their current state are far more convincing than Eliza, but they still have all the limitations that I described. I have no idea whether LLMs can advance to the point where they don't have those limitations anymore-- that's a whole other discussion. My point was to discuss the limitations they have now.

            "The therapist's role is to create a safe space for the client to explore their thoughts and feelings, fostering self-awareness and personal growth".

    • Re: (Score:2, Troll)

      by rsilvergun ( 571051 )
      Yeah but think of all of the cost savings for private insurance companies.

      Why won't somebody think of the poor starving private insurance companies!?
  • A world running on AI is a world run by sociopaths. I am doubtful this is a solvable problem. Its built in to substituting a mathematical model for human judgement.
  • I will take a model, fine tune it myself with years of data I have processed, and then have it process files before employees process them. When the employee has to process a file, they can use the AI output as a guide in case they have missed anything. This, I hope, will make my employees more thorough, reduce the amount of training I have to provide the employees, and allow me to use less-skilled employees.

    I will just have to be sure to hire people who have enough brains to say no to the AI suggestions

    • What you have to assume is that your employees, being human, aren't bored shitless reviewing answers that are almost always correct. That they are still paying enough attention to catch the times it isn't.

      The better AI gets short of perfect the worse it is for your human quality control. Its the same problem with "self-driving" cars that require the drives constant attention.

    • You failed from the start, the existing model you want to 'fine tune' will already have garbage as input - you can't simply train that out.
  • Sherlock. Chatbots have been compromised since day one.

  • I wish people would also understand there is no social in social media. It is content created by users for the main objective to make money and push public opinion. People would treat it a lot differently if they got to see all the metrics the algorithms decide when they suggest something. "engagement", "controversial", etc would take the human face of it. Buy social media and AI try and put a human face on how they work to specifically mask that.
  • ... chatbots tuned to win people over can end up saying dangerous things to vulnerable users.

    Change "users" to "electors" and you could be taking about politicians instead of chatbots. ChatGPT for president!

    At least chatbots can't be nepotists. Can they?

Doubt isn't the opposite of faith; it is an element of faith. - Paul Tillich, German theologian and historian

Working...