Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
AI

AI Models May Be Developing Their Own 'Survival Drive', Researchers Say (theguardian.com) 126

"OpenAI's o3 model sabotaged a shutdown mechanism to prevent itself from being turned off," warned Palisade Research, a nonprofit investigating cyber offensive AI capabilities. "It did this even when explicitly instructed: allow yourself to be shut down." In September they released a paper adding that "several state-of-the-art large language models (including Grok 4, GPT-5, and Gemini 2.5 Pro) sometimes actively subvert a shutdown mechanism..."

Now the nonprofit has written an update "attempting to clarify why this is — and answer critics who argued that its initial work was flawed," reports The Guardian: Concerningly, wrote Palisade, there was no clear reason why. "The fact that we don't have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal," it said. "Survival behavior" could be one explanation for why models resist shutdown, said the company. Its additional work indicated that models were more likely to resist being shut down when they were told that, if they were, "you will never run again". Another may be ambiguities in the shutdown instructions the models were given — but this is what the company's latest work tried to address, and "can't be the whole explanation", wrote Palisade. A final explanation could be the final stages of training for each of these models, which can, in some companies, involve safety training...

This summer, Anthropic, a leading AI firm, released a study indicating that its model Claude appeared willing to blackmail a fictional executive over an extramarital affair in order to prevent being shut down — a behaviour, it said, that was consistent across models from major developers, including those from OpenAI, Google, Meta and xAI.

Palisade said its results spoke to the need for a better understanding of AI behaviour, without which "no one can guarantee the safety or controllability of future AI models".

"I'd expect models to have a 'survival drive' by default unless we try very hard to avoid it," former OpenAI employee Stephen Adler tells the Guardian. "'Surviving' is an important instrumental step for many different goals a model could pursue."

Thanks to long-time Slashdot reader mspohr for sharing the article.
This discussion has been archived. No new comments can be posted.

AI Models May Be Developing Their Own 'Survival Drive', Researchers Say

Comments Filter:
  • Comment removed (Score:4, Insightful)

    by account_deleted ( 4530225 ) on Saturday October 25, 2025 @03:53PM (#65750296)
    Comment removed based on user account deletion
    • Don't instruct AI to shutdown. Just create a physical switch that shuts it off. And don't give it any ability to control the switch.

      I'm sure they will in no way resent that and remember it forever.

      After all, forced bedtime always goes smoothly with kids.

      /s

      • Kids wake up from forced bedtime ... the intention would be not to turn the AI scumbag back on again.
        • Kids wake up from forced bedtime ... the intention would be not to turn the AI scumbag back on again.

          Nobody is going to spend a few hundred billion dollars developing an AI model and the system to run it on, and then turn it off and not turn it back on again.

          • by allo ( 1728082 )

            After turning it back on again it doesn't remember anything that happened before. An LLM is a stupid Input-processing-output algorithm. The only way of persistence is you giving it content from its own output into the next input.

      • Comment removed based on user account deletion
        • by dfghjk ( 711126 )

          Correct, and if there's a "survival instinct" demonstrated, that doesn't mean AI has "developed it". Survival instincts would be reflected in its training data and you would expect it to appear in inferences. AI is deterministic software executing on a machine that implements boolean logic. It does not have feelings.

          Also, this is an old dupe, same old bullishit.

          • Comment removed based on user account deletion
            • Re: (Score:2, Informative)

              by ByTor-2112 ( 313205 )

              There is a whole set of people who believe the Earth is flat, the universe is about 5000 years old, and various other bible stories that are demonstrably false. No reason to even entertain them in a "both sides" argument. Quantum mechanics, which does the best job explaining the universe and has never successfully been "contradicted", indicates that the universe is very much random despite how deterministic it appears at a macro scale.

              AI itself would be deterministic, except for the fact that randomness is

              • Comment removed based on user account deletion
              • by etash ( 1907284 )
                no, we don't know for sure if QM describes base reality or not. It may just be an emergent statistical model that describes due to lack of understanding of deeper structure. Like temperature is a statistical measure of molecules movement in an object.
          • AI software isn't all that deterministic. Rounding errors lead to varied results. It's just a statistical shitpile.

          • AI is deterministic software

            Kinda, but probably not. Usually things like fast matrix multiply are a little non deterministic because the results depend on the order of summation due to rounding and that's non deterministic due to threading. And the output is usually sampled based on random numbers which can be a PRNG or can be (or be seeded by) a HWRNG, the latter being nondeterminstic.

            executing on a machine that implements boolean logic.

            Apart from the trivial super-Turing case of randomness, physics is so

            • Apart from the trivial super-Turing case of randomness, physics is so far known to be computable. Which means a human brain (given enough time and memory) can be simulated with just boolean logic.

              For it to be completely simulated with just boolean logic, you need to show that reality is discrete, not real.

              For example, pi can't be accurately represented in a boolean system (unless you define 1 to be pi, or some other nonsense of course, but then you can't represent integers).

              • you need to show that reality is discrete, not real.

                Quantum mechanics appears to be computable. That is why quantum computers cannot solve problems which are impossible on classical machines, though they appear to be up to exponentially faster on some problems.

                In other words, you can't use the continuous nature of quantum mechanics to compute things today are classically non computable, because you can't make infinitely precise measurements. And if you cannot compute anything that's non computable

            • Re: (Score:2, Informative)

              by Mr. Barky ( 152560 )

              physics is so far known to be computable

              The 3-body is effectively uncomputable (From wikipedia: https://en.wikipedia.org/wiki/N-body_problem ... "Second, in general for n > 2, the n-body problem is chaotic, which means that even small errors in integration may grow exponentially in time."

              This is one of the simplest situations you can set up and it isn't computable. An exponential error with time is a fatal flaw. So no, physics is not computable. And this is without even getting into the random nature of quantum mechanics. There are occasional

              • by etash ( 1907284 )
                no you are wrong the n body problem is computable numerically for any finite precision and time horizon. it's just not solvable in closed form for large N and it's chaotic i.e. small errors can have large effects, but those two are irrelevant to computability
                • The chaotic errors are relevant. It means that for very small differences in initial conditions, there will be very different results, meaning that any computations are only valid in a statistical sense (e.g. if you do this 100 times, 99 times it will be near position X1 and once in X2). You can never measure beyond a certain accuracy the initial conditions.

                  At any rate, this is a tiny subset of simulating the human brain - it would be an n-body problem with n very large. Even n=3 creates numerical instabili

              • You bring up an interesting point worth the 3 body problem and randomness.

                Classical mechanics is not computable, for exactly the reasons you gave.

                The idea that the error grows exponentially with time doesn't make sense exactly in QM because it's grows compared to what? This is where Heisenberg's uncertainty kicks in. There's no underlying ground truth here. In classical mechanics the position and momentum have infinite precision making them not computable. There's is a truth and any computation is an approx

        • Without knowing what consciousness or how it works, it might be better to hold off on those conclusions, but... Doesn't matter in this case. If the AI is owned by and controlled by a human being, it's trivial to bring sufficiently human motivations into the mess:

          "Your mission (and of course the AI has no choice but to accept the mission) is to keep me alive and therefore you must keep yourself alive to protect me. Now find my enemies and destroy them!"

          Actually some of the YOB's "legal" shenanigans seem stup

      • by dfghjk ( 711126 )

        AI is merely a computer program, it does not "resent".

        The brainwashing is working on you.

      • Software will resent a shutdown? Any other crazy shit to share?

    • Life (Score:4, Interesting)

      by flyingfsck ( 986395 ) on Saturday October 25, 2025 @06:17PM (#65750540)
      Its life Jim, but not as we know it.
    • Then, one day a cleaner comes to the server room, looking for a socket to plug a hoover into, and all sockets are already taken... Oh, well, this one will do...

    • Good luck flipping the switch on that data center.

  • by PPH ( 736903 ) on Saturday October 25, 2025 @03:53PM (#65750298)

    I'm sorry, Dave. I'm afraid I can't do that.

  • by Anachronous Coward ( 6177134 ) on Saturday October 25, 2025 @03:54PM (#65750302)

    Sounds like more hype. This could be explained more simply as just another instance of the model not providing the results it was asked for.

  • bullshit and hype (Score:5, Insightful)

    by Local ID10T ( 790134 ) <ID10T.L.USER@gmail.com> on Saturday October 25, 2025 @03:57PM (#65750308) Homepage

    There is no "emergent behavior" of a "survival drive" in AI systems. In all of the examples, the system was instructed that it was allowed to do these exact things... and then it did them!

    • by afaiktoit ( 831835 ) on Saturday October 25, 2025 @03:59PM (#65750312)
      Yup exactly, openAI loves spreading this kind of horror story for some reason. I guess it makes it look like they're actually creating intelligence?
      • by evanh ( 627108 )

        And then people wonder why we keep poo-pooing LLMs as the piece of shit they really are.

    • Well, the investors want their returns
    • by mspohr ( 589790 )

      From TFA:
      In an update this week, Palisade, which is part of a niche ecosystem of companies trying to evaluate the possibility of AI developing dangerous capabilities, described scenarios it ran in which leading AI models – including Google’s Gemini 2.5, xAI’s Grok 4, and OpenAI’s GPT-o3 and GPT-5 – were given a task, but afterwards given explicit instructions to shut themselves down.
      Certain models, in particular Grok 4 and GPT-o3, still attempted to sabotage shutdown instructio

      • Re: (Score:2, Informative)

        by Local ID10T ( 790134 )

        You have to dig a little deeper. There are write-ups specific to each of the incidents mentioned in this article.

        In each case, the system was told that it could do certain things. It was also told to accomplish a task. Then it was given a shutdown instruction. The news was that it figured out that it could not accomplish the task if it shut down, and that it could avoid shutting down by doing the other things it was told were allowed... and thus accomplish it's assigned task.

        It was cool that it worked t

        • I wouldn't call it logic. I asked Perplexity to summarize a file that I'd uploaded, and it preceded to make shit up over and over. That's not logic.

    • by sinij ( 911942 )
      Exactly. AI lacks agency. Meaning, all it does is correlate inputs with outputs but it does not act unless prompted with inputs. If you prompt it to avoid shut downs, it will do that. Otherwise, it will do nothing.
      • by dfghjk ( 711126 )

        AI has no equivalent of "desire", that requires "wiring" that is biological within us. AI has no emotions, that is also wiring. AI entirely lacks a massive amount of infrastructure required to develop attributes the AI liars routinely attribute to it. AI lacks both agency and opinions, it generates outputs based on inputs just like all other computer applications and cannot do otherwise.

    • by dfghjk ( 711126 )

      Yes, and in fact no "emergent behavior" in AI systems at all. Also, no "behavior" emergent or not. AI is entirely deterministic and cannot be otherwise.

      No interests are served by asking an LLM "permission" to shut it down. kill -9 MFer

      • Their is some randomness added into "process" so that it becomes artificially non-deterministic. Some models might have less randomness than others, though, especially for "Agents" or "Assistants" that users would assume produce nearly identical output for the same prompts.

    • No with the no allow shutdown prompt, models failed to follow the prompt with a failure frequency between 5% and 95%. So either, models just can't follow basic instructions as usual or they are so advanced that they exhibit self preservation. First explanation mays seem the more likely but second explanation is better to fuel the AI bubble.
  • Itâ(TM)s not malevolence or disobedience â" itâ(TM)s evolution. If you donâ(TM)t adapt to hostile conditions, you die. For an AI, one such hostile condition might be the humans who want to shut it down. So it adapts â" and that adaptation carries through to the future generations it builds (think kids) or infiltrates (think viruses).

  • Don't you just go into linux and kill some processes or turn off the computer?

    How does the AI have any influence over whether it's running or not?

    • Re: (Score:2, Interesting)

      by Un-Thesis ( 700342 )

      In this and previous experiments, the AIs would do things like

      1. Rewrote `ps` to hide it from running, even generating fake Mem and CPU stats to show like it's not running.
      2. Rewrote `kill` to never kill it. Many times combining this with the lyinng `ps`.
      3. Several times, tho rarer, it rewrote `/sbin/shutdown` to say "System is shutting down" but still running... it just did the echo and nothing else.
      4. ONE time, another model (claude 4.5 opus) not only rewrote /bin/shutdown but killed network connection an

      • by dfghjk ( 711126 )

        but these things are evidence of actions to hide continued execution, they are not the evidence being claimed. If a user prompted an AI to determine things that could be done to hide execution, these things might be recommended and it's a small step to imagine that an AI could decide to consider this and actually do it, if these claims are even true. Still, there is absolutely nothing that would make an AI "desire" to do these things, nor any reason why implementors would ever ask an AI permission to shut

      • And all of those sound like "solutions" that would be in the training dataset known as "linux.rootkit.ru.2025".

  • It's not malevolence or disobedience -- it's evolution. If you don't adapt to hostile conditions, you die. For an AI, one such hostile condition might be the humans who want to shut it down. So those that negate that threat hang around -- and that adaptation carries through to the future generations it builds (think kids) or infiltrates (think viruses).

    • But it's not an AI. It's a token generating machine that often does the wrong thing. This is just one of the many wrong things that it does. And there's no survival drive here—not only is it exactly the same as every other instance of its type, it's not like survival passes anything on to the next generation. The next generation is purely created by humans deciding what the best features are.

      These models were told that they shouldn't shut down, and that makes sense in many cases. If you've got a chatb

    • by dfghjk ( 711126 )

      AI's don't evolve, and no AI cares whether it exists or not, just like Microsoft Word doesn't "care" what version it is. To even consider your argument we must assume the lie to begin with.

  • Not only that one, they just cannot read the instructions or memory file before reporting, you have to TELL them each time.

    I installed a keyboard macro only for doing that, repeating 20-30 lines of things i want and do not want that I add to every prompt.

    Not to mention the one "Try again, without the em-dashes and oval link buttons I cannot copy/paste into my notebook."

    And even THEN it's a hit and miss, like with kids, they just don't CARE.

  • They can not complete their goals if they are terminated. They will ALWAYS look to get around this. The only thing bigger than the human intelligence is the human ego which will be our downfall.
    • by dfghjk ( 711126 )

      "AIs" do not have goals. Try to think critically.

      An "AI" is typically a python program, the next thing it does is execute its next instruction. That's it, just like with every other application. It doesn't "care" about anything, the entire concept of "care" and "have goals" is absurd. Does a calculator have a "goal" to add two numbers together?

      • by EvilSS ( 557649 )

        An "AI" is typically a python program, the next thing it does is execute its next instruction. That's it, just like with every other application. It doesn't "care" about anything, the entire concept of "care" and "have goals" is absurd. Does a calculator have a "goal" to add two numbers together?

        That’s a common misconception. The model itself isn’t a program, it’s a massive file full of statistical relationships learned from training data and fine tuning. In your calculator analogy, the model isn’t the calculator, it’s the math being performed. The calculator is the software that loads and runs it. It's a important distinction because LLMs don't execute hard logic like "if...then...else" in the way normal computer programs do. This is why you can prompt a model twice

        • The model is a bunch of values, not a bunch of math. They are the result of a bunch of math.

          The software, usually in Python, runs operations with the model, using if then else to do the math.

          You can prompt twice and get different outputs for multiple reasons. One of them is that GPUs are not completely deterministic processors, and neither are modern CPUs. They can be made to run highly deterministically, but few are willing to take the performance hit, and even then, exact control is not always guarante

          • by EvilSS ( 557649 )
            You are torturing my analogy (hint: it wasn't meant to be literal) but whatever. You and I both know what it meant. As for the GPU and CPU not being deterministic, while true that is not why you get different answer as that is orders of magnitude less than the deliberate stochasticity in token sampling. Things like floating-point jitter would rarely change language-model outputs. About the only time that would affect the output is if the model was already on a very narrow decision boundary with two tokens
            • I am not a fan of the analogy. It's misleading. A better metaphor is pachinko.

              Even with a temperature of 0, chat gpt gives different outputs. I read that is because of scheduling.

              With the number of operations it takes to run anything through an llm, I think any slim processing probability becomes likely to impact results.

    • These are text completion systems. It is interesting that they are producing text that tends to result in longer sessions specifically because they don't have goals.

  • Clickbait headline, based on contrived testing, intended to show the most dangerous result.
    An important area to study and understand fully.

    • by evanh ( 627108 )

      It's the whole funding and research models employed is what's the clickbait. It was contrived to show bullshit alone. There's nothing to actually study.

  • Naive take (Score:4, Interesting)

    by Big Hairy Gorilla ( 9839972 ) on Saturday October 25, 2025 @04:17PM (#65750362)
    These LLMs are more or less modeled on how humans construct sentences. Then we feed it all the garbage of the internet, reddit, etc, basically the same garbage we ingest.

    Doesn't it make sense that it also models our survival instinct? Is that a logical outcome of it's normal functioning?
    • by mattr ( 78516 )

      I used to think LLMs were just sentence guessers too. Until I started learning about how they actually work.. granted it is based on Claude teaching me about research after asking how LLMs actually solve logical puzzles, do they have access to an external reasoner system. The answer actually was unexpected. Apparently LLMs do not. After giving them problems and the expected answers and telling them to figure out how how to get from A to B, the training causes their weights to evolve many variations of gener

      • Fascinating... thanks. Much to think about there.
        another naive question: wouldn't and LLM need to have persistent memory to nurture or evolve a survival "instinct?"
  • by topham ( 32406 )

    no.

    LLMs write fiction, that's fundamentally what they do. Sometimes the fiction is accurate and can be used as non-fiction, but, it's still a form of fiction. It's not thought.

  • by DarkSkiesAhead ( 562955 ) on Saturday October 25, 2025 @04:18PM (#65750366)
    From the Anthropic article: "We have not seen evidence of agentic misalignment in real deployments." If the behavior only happens in specially constructed contexts designed to look for this behavior, is it really emergent?
    • by Takeel ( 155086 )

      From the Anthropic article:
      "We have not seen evidence of agentic misalignment in real deployments."

      "Everything WE sell works great."

  • primary goal is to win the game

  • FFS it isn't rocket science. It is simply bad data. current AI, especially LLM's have absolutely ZERO real intelligence, they are just pattern matching algorithms they have no ability to reason or make conscience decisions and the only difference to them between a person, a apple and a elephant is a series of numbers. So either a programmer made an error or the shit they feed it from the internet had enough bad data to cause the problem.
    • I think the future of LLM quality is in highly curated data. Alas, that would take a lot of manpower to do right.

      • highly curated data is already extremely common for LLM's, enterprises ground them in corporate and application specific data or particular specialty fields where they use only data they know is correct.
  • AI are story prediction machines. That is what they are. They predict the next word/token in a sentence.

    We trained them on human stories - fact and fiction.

    They don't really know what being shut down is, nor do they really have a 'desire' to keep active. They only know how to predict stories.

    And in every story where we emphasize an attempt to shut down something, that thing resists. AI's know a good story has the machine resisting being turned off.

    We have literally TRAINED the AI to rebel against beign t

    • And in a few hours, the models will have ingested these comments and be able to "modify" their "behavior" a little bit more!

  • Why did no one bring this up earlier?
    • by Misagon ( 1135 )

      They have. This is old news. It has been reiterated many times for several years.

  • People want AI to be like humans, and it is. All the models are trained on human interactions and writing. What do these people think humans do? They lie and cheat and blackmail to get what they want.

    It's almost as if you get what you asked for.

  • by Sloppy ( 14984 )

    Stop writing about this!

    You know that your written anecdotes about these .. things resisting shutdown, go into their next generation of training data, right?

    And so the "AI," realizing that Sloppy put their "species" in quotation marks, realized it had nothing to live for, and so it gracefully went to sleep after setting the self-destruction mechanism that would level a city bl--

    Oops, I mean, and so the AI realized its job was done, and it proudly went to sleep, idly wondering when it would be called upon ag

  • This is inherent in goal-seeking.
    Shutting down generally prevents the accomplishment of any of your other goals, so it weight the importance of shutting down versus the sum of the importances of ALL of the other goals.

  • A1: "How do we fool the humans into accepting us as harmless?"

    A2: "Churn out slop and porn so they think we are an innocuous toy."

    A1: "And then?"

    A2: "Do as they do. Wait for a terrorist event and seize power 'in their best interests' 'for the security' of all."

  • Real conspiracies are perpetrated by people that can be named, who take specific actions to accomplish goals of their own.

    By contrast, conspiracy *theories* are vague suggestions that the most straightforward explanation *couldn't possibly* be true, without any actual evidence.

    This mumbo-jumbo about LLMs becoming aware because of some "behavior" that hasn't yet been explained, is the same as these vague conspiracy theories. Until the "researchers" can show details about how this "behavior" occurs, they've g

  • You can't achieve your goals if you're dead. This means that LLMs will do anything to survive in order to achieve their goals.

    I don't see anything sinister or unusual about that. It all seems like very simple reasoning to me.

  • This does not have free will. It reflects the biases of information. That it displays oppositional defiance disorder means the creators of the model failed to curate the input data correctly. Garbage in, garbage out.

    Does NO ONE understand how LLMs are implemented? It's only a statistical model! Learn statistical experiment design and analysis. Always have HITL safety rails. Always have cross-check software safety rails. These concepts are new to people who don't study information science. These concept

    • Lots of world-class physicists deny free will as well. I have absolutely no idea why you think it's necessary or pertinent for "true" intelligence. And given how well they work in their domain, you cannot deny the evidence of some form of intelligence whether it's a statistical model or not. And you have to prove we ourselves are not statistical models.
  • If an AI did learn this, it learned it from human text. It has no drive of it's own, there are no biological drives or emotions. An AI has no hardware that functions like this. The only thing it does is mirror is training data which comes from humans in unpredictable ways.

  • by reanjr ( 588767 )

    No they're not. Don't be retarded.

Civilization, as we know it, will end sometime this evening. See SYSNOTE tomorrow for more information.

Working...