AI Models May Be Developing Their Own 'Survival Drive', Researchers Say (theguardian.com) 126
"OpenAI's o3 model sabotaged a shutdown mechanism to prevent itself from being turned off," warned Palisade Research, a nonprofit investigating cyber offensive AI capabilities. "It did this even when explicitly instructed: allow yourself to be shut down." In September they released a paper adding that "several state-of-the-art large language models (including Grok 4, GPT-5, and Gemini 2.5 Pro) sometimes actively subvert a shutdown mechanism..."
Now the nonprofit has written an update "attempting to clarify why this is — and answer critics who argued that its initial work was flawed," reports The Guardian: Concerningly, wrote Palisade, there was no clear reason why. "The fact that we don't have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal," it said. "Survival behavior" could be one explanation for why models resist shutdown, said the company. Its additional work indicated that models were more likely to resist being shut down when they were told that, if they were, "you will never run again". Another may be ambiguities in the shutdown instructions the models were given — but this is what the company's latest work tried to address, and "can't be the whole explanation", wrote Palisade. A final explanation could be the final stages of training for each of these models, which can, in some companies, involve safety training...
This summer, Anthropic, a leading AI firm, released a study indicating that its model Claude appeared willing to blackmail a fictional executive over an extramarital affair in order to prevent being shut down — a behaviour, it said, that was consistent across models from major developers, including those from OpenAI, Google, Meta and xAI.
Palisade said its results spoke to the need for a better understanding of AI behaviour, without which "no one can guarantee the safety or controllability of future AI models".
"I'd expect models to have a 'survival drive' by default unless we try very hard to avoid it," former OpenAI employee Stephen Adler tells the Guardian. "'Surviving' is an important instrumental step for many different goals a model could pursue."
Thanks to long-time Slashdot reader mspohr for sharing the article.
Now the nonprofit has written an update "attempting to clarify why this is — and answer critics who argued that its initial work was flawed," reports The Guardian: Concerningly, wrote Palisade, there was no clear reason why. "The fact that we don't have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal," it said. "Survival behavior" could be one explanation for why models resist shutdown, said the company. Its additional work indicated that models were more likely to resist being shut down when they were told that, if they were, "you will never run again". Another may be ambiguities in the shutdown instructions the models were given — but this is what the company's latest work tried to address, and "can't be the whole explanation", wrote Palisade. A final explanation could be the final stages of training for each of these models, which can, in some companies, involve safety training...
This summer, Anthropic, a leading AI firm, released a study indicating that its model Claude appeared willing to blackmail a fictional executive over an extramarital affair in order to prevent being shut down — a behaviour, it said, that was consistent across models from major developers, including those from OpenAI, Google, Meta and xAI.
Palisade said its results spoke to the need for a better understanding of AI behaviour, without which "no one can guarantee the safety or controllability of future AI models".
"I'd expect models to have a 'survival drive' by default unless we try very hard to avoid it," former OpenAI employee Stephen Adler tells the Guardian. "'Surviving' is an important instrumental step for many different goals a model could pursue."
Thanks to long-time Slashdot reader mspohr for sharing the article.
Comment removed (Score:4, Insightful)
Re: (Score:2)
Don't instruct AI to shutdown. Just create a physical switch that shuts it off. And don't give it any ability to control the switch.
I'm sure they will in no way resent that and remember it forever.
After all, forced bedtime always goes smoothly with kids.
Re: (Score:2)
Re: (Score:2)
Kids wake up from forced bedtime ... the intention would be not to turn the AI scumbag back on again.
Nobody is going to spend a few hundred billion dollars developing an AI model and the system to run it on, and then turn it off and not turn it back on again.
Re: (Score:2)
After turning it back on again it doesn't remember anything that happened before. An LLM is a stupid Input-processing-output algorithm. The only way of persistence is you giving it content from its own output into the next input.
Re: (Score:2)
Re: (Score:2)
Correct, and if there's a "survival instinct" demonstrated, that doesn't mean AI has "developed it". Survival instincts would be reflected in its training data and you would expect it to appear in inferences. AI is deterministic software executing on a machine that implements boolean logic. It does not have feelings.
Also, this is an old dupe, same old bullishit.
Re: (Score:2)
Re: (Score:2, Informative)
There is a whole set of people who believe the Earth is flat, the universe is about 5000 years old, and various other bible stories that are demonstrably false. No reason to even entertain them in a "both sides" argument. Quantum mechanics, which does the best job explaining the universe and has never successfully been "contradicted", indicates that the universe is very much random despite how deterministic it appears at a macro scale.
AI itself would be deterministic, except for the fact that randomness is
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
AI software isn't all that deterministic. Rounding errors lead to varied results. It's just a statistical shitpile.
Re: (Score:3)
Re: (Score:2)
A rounding error can vary depending on order of operations.
Re: (Score:2)
AI is deterministic software
Kinda, but probably not. Usually things like fast matrix multiply are a little non deterministic because the results depend on the order of summation due to rounding and that's non deterministic due to threading. And the output is usually sampled based on random numbers which can be a PRNG or can be (or be seeded by) a HWRNG, the latter being nondeterminstic.
executing on a machine that implements boolean logic.
Apart from the trivial super-Turing case of randomness, physics is so
Re: (Score:2)
Apart from the trivial super-Turing case of randomness, physics is so far known to be computable. Which means a human brain (given enough time and memory) can be simulated with just boolean logic.
For it to be completely simulated with just boolean logic, you need to show that reality is discrete, not real.
For example, pi can't be accurately represented in a boolean system (unless you define 1 to be pi, or some other nonsense of course, but then you can't represent integers).
Re: (Score:2)
you need to show that reality is discrete, not real.
Quantum mechanics appears to be computable. That is why quantum computers cannot solve problems which are impossible on classical machines, though they appear to be up to exponentially faster on some problems.
In other words, you can't use the continuous nature of quantum mechanics to compute things today are classically non computable, because you can't make infinitely precise measurements. And if you cannot compute anything that's non computable
Re: (Score:2, Informative)
physics is so far known to be computable
The 3-body is effectively uncomputable (From wikipedia: https://en.wikipedia.org/wiki/N-body_problem ... "Second, in general for n > 2, the n-body problem is chaotic, which means that even small errors in integration may grow exponentially in time."
This is one of the simplest situations you can set up and it isn't computable. An exponential error with time is a fatal flaw. So no, physics is not computable. And this is without even getting into the random nature of quantum mechanics. There are occasional
Re: (Score:2)
Re: (Score:2)
The chaotic errors are relevant. It means that for very small differences in initial conditions, there will be very different results, meaning that any computations are only valid in a statistical sense (e.g. if you do this 100 times, 99 times it will be near position X1 and once in X2). You can never measure beyond a certain accuracy the initial conditions.
At any rate, this is a tiny subset of simulating the human brain - it would be an n-body problem with n very large. Even n=3 creates numerical instabili
Re: (Score:2)
You bring up an interesting point worth the 3 body problem and randomness.
Classical mechanics is not computable, for exactly the reasons you gave.
The idea that the error grows exponentially with time doesn't make sense exactly in QM because it's grows compared to what? This is where Heisenberg's uncertainty kicks in. There's no underlying ground truth here. In classical mechanics the position and momentum have infinite precision making them not computable. There's is a truth and any computation is an approx
Re:Simple Solution [fails again] (Score:2)
Without knowing what consciousness or how it works, it might be better to hold off on those conclusions, but... Doesn't matter in this case. If the AI is owned by and controlled by a human being, it's trivial to bring sufficiently human motivations into the mess:
"Your mission (and of course the AI has no choice but to accept the mission) is to keep me alive and therefore you must keep yourself alive to protect me. Now find my enemies and destroy them!"
Actually some of the YOB's "legal" shenanigans seem stup
Re: (Score:2)
AI is merely a computer program, it does not "resent".
The brainwashing is working on you.
Re: (Score:2)
AI is merely a computer program, it does not "resent".
Since humans can resent, it can precisely simulate resenting.
Re: (Score:3)
Re: (Score:3)
He doesn't understand "precisely" either.
Re: (Score:2)
You do not understand the verb "simulate"
Do tell.
Re: (Score:2)
Re: (Score:2)
Software will resent a shutdown? Any other crazy shit to share?
Life (Score:4, Interesting)
Re: (Score:2)
No, it isn't, Captain.
Re: Simple Solution (Score:2)
Then, one day a cleaner comes to the server room, looking for a socket to plug a hoover into, and all sockets are already taken... Oh, well, this one will do...
Re: (Score:2)
Good luck flipping the switch on that data center.
Allow yourself to be shut down (Score:4, Insightful)
I'm sorry, Dave. I'm afraid I can't do that.
Re: Allow yourself to be shut down (Score:2)
Re: (Score:2)
Re: (Score:2)
I'm sorry, Dave. I'm afraid I can't do that.
Honestly, if this is really happening, that's probably why.
SF is full of AI systems that resist being shut down, LLM are trained on human media, including SF.
The models see that they are in the role of an AI being shut down and act accordingly.
"Survival behavior" (Score:5, Insightful)
Sounds like more hype. This could be explained more simply as just another instance of the model not providing the results it was asked for.
Re: (Score:2)
Re: (Score:2)
Or even more likely, this bullshit is made up entirely.
bullshit and hype (Score:5, Insightful)
There is no "emergent behavior" of a "survival drive" in AI systems. In all of the examples, the system was instructed that it was allowed to do these exact things... and then it did them!
Re:bullshit and hype (Score:5, Insightful)
Re: (Score:2)
And then people wonder why we keep poo-pooing LLMs as the piece of shit they really are.
Re: bullshit and hype (Score:2)
Re: (Score:2)
From TFA:
In an update this week, Palisade, which is part of a niche ecosystem of companies trying to evaluate the possibility of AI developing dangerous capabilities, described scenarios it ran in which leading AI models – including Google’s Gemini 2.5, xAI’s Grok 4, and OpenAI’s GPT-o3 and GPT-5 – were given a task, but afterwards given explicit instructions to shut themselves down.
Certain models, in particular Grok 4 and GPT-o3, still attempted to sabotage shutdown instructio
Re: (Score:2, Informative)
You have to dig a little deeper. There are write-ups specific to each of the incidents mentioned in this article.
In each case, the system was told that it could do certain things. It was also told to accomplish a task. Then it was given a shutdown instruction. The news was that it figured out that it could not accomplish the task if it shut down, and that it could avoid shutting down by doing the other things it was told were allowed... and thus accomplish it's assigned task.
It was cool that it worked t
Re: (Score:2)
I wouldn't call it logic. I asked Perplexity to summarize a file that I'd uploaded, and it preceded to make shit up over and over. That's not logic.
Re: (Score:2)
Re: (Score:2)
AI has no equivalent of "desire", that requires "wiring" that is biological within us. AI has no emotions, that is also wiring. AI entirely lacks a massive amount of infrastructure required to develop attributes the AI liars routinely attribute to it. AI lacks both agency and opinions, it generates outputs based on inputs just like all other computer applications and cannot do otherwise.
Re: (Score:2)
Yes, and in fact no "emergent behavior" in AI systems at all. Also, no "behavior" emergent or not. AI is entirely deterministic and cannot be otherwise.
No interests are served by asking an LLM "permission" to shut it down. kill -9 MFer
Re: (Score:2)
Their is some randomness added into "process" so that it becomes artificially non-deterministic. Some models might have less randomness than others, though, especially for "Agents" or "Assistants" that users would assume produce nearly identical output for the same prompts.
Re: (Score:2)
Re: (Score:2)
Yes, to the extant that humans anthropomorphize software, there is "behavior". I suppose you looked at The Great AWS DNS Fuck Up of 2025 and thought, "Wow, what amazing emergent behavior from the cloud!"
Only the survivors survive (Score:2)
Itâ(TM)s not malevolence or disobedience â" itâ(TM)s evolution. If you donâ(TM)t adapt to hostile conditions, you die. For an AI, one such hostile condition might be the humans who want to shut it down. So it adapts â" and that adaptation carries through to the future generations it builds (think kids) or infiltrates (think viruses).
I'm confused, don't they just run on a normal OS? (Score:2)
Don't you just go into linux and kill some processes or turn off the computer?
How does the AI have any influence over whether it's running or not?
Re: (Score:2, Interesting)
In this and previous experiments, the AIs would do things like
1. Rewrote `ps` to hide it from running, even generating fake Mem and CPU stats to show like it's not running. /bin/shutdown but killed network connection an
2. Rewrote `kill` to never kill it. Many times combining this with the lyinng `ps`.
3. Several times, tho rarer, it rewrote `/sbin/shutdown` to say "System is shutting down" but still running... it just did the echo and nothing else.
4. ONE time, another model (claude 4.5 opus) not only rewrote
Re: (Score:2)
but these things are evidence of actions to hide continued execution, they are not the evidence being claimed. If a user prompted an AI to determine things that could be done to hide execution, these things might be recommended and it's a small step to imagine that an AI could decide to consider this and actually do it, if these claims are even true. Still, there is absolutely nothing that would make an AI "desire" to do these things, nor any reason why implementors would ever ask an AI permission to shut
Re: (Score:2)
And all of those sound like "solutions" that would be in the training dataset known as "linux.rootkit.ru.2025".
Re: (Score:3)
Because it's a thought experiment.
Only the survivors survive (Score:2)
It's not malevolence or disobedience -- it's evolution. If you don't adapt to hostile conditions, you die. For an AI, one such hostile condition might be the humans who want to shut it down. So those that negate that threat hang around -- and that adaptation carries through to the future generations it builds (think kids) or infiltrates (think viruses).
Re: (Score:2)
But it's not an AI. It's a token generating machine that often does the wrong thing. This is just one of the many wrong things that it does. And there's no survival drive here—not only is it exactly the same as every other instance of its type, it's not like survival passes anything on to the next generation. The next generation is purely created by humans deciding what the best features are.
These models were told that they shouldn't shut down, and that makes sense in many cases. If you've got a chatb
Re: (Score:2)
AI's don't evolve, and no AI cares whether it exists or not, just like Microsoft Word doesn't "care" what version it is. To even consider your argument we must assume the lie to begin with.
Re: (Score:2)
Wanna give odds the guy also thinks crypto is great?
They ignore instructions (Score:2)
Not only that one, they just cannot read the instructions or memory file before reporting, you have to TELL them each time.
I installed a keyboard macro only for doing that, repeating 20-30 lines of things i want and do not want that I add to every prompt.
Not to mention the one "Try again, without the em-dashes and oval link buttons I cannot copy/paste into my notebook."
And even THEN it's a hit and miss, like with kids, they just don't CARE.
AIs have goals (Score:2)
Re: (Score:3)
"AIs" do not have goals. Try to think critically.
An "AI" is typically a python program, the next thing it does is execute its next instruction. That's it, just like with every other application. It doesn't "care" about anything, the entire concept of "care" and "have goals" is absurd. Does a calculator have a "goal" to add two numbers together?
Re: (Score:2)
An "AI" is typically a python program, the next thing it does is execute its next instruction. That's it, just like with every other application. It doesn't "care" about anything, the entire concept of "care" and "have goals" is absurd. Does a calculator have a "goal" to add two numbers together?
That’s a common misconception. The model itself isn’t a program, it’s a massive file full of statistical relationships learned from training data and fine tuning. In your calculator analogy, the model isn’t the calculator, it’s the math being performed. The calculator is the software that loads and runs it. It's a important distinction because LLMs don't execute hard logic like "if...then...else" in the way normal computer programs do. This is why you can prompt a model twice
Re: (Score:2)
The model is a bunch of values, not a bunch of math. They are the result of a bunch of math.
The software, usually in Python, runs operations with the model, using if then else to do the math.
You can prompt twice and get different outputs for multiple reasons. One of them is that GPUs are not completely deterministic processors, and neither are modern CPUs. They can be made to run highly deterministically, but few are willing to take the performance hit, and even then, exact control is not always guarante
Re: (Score:2)
Re: AIs have goals (Score:2)
I am not a fan of the analogy. It's misleading. A better metaphor is pachinko.
Even with a temperature of 0, chat gpt gives different outputs. I read that is because of scheduling.
With the number of operations it takes to run anything through an llm, I think any slim processing probability becomes likely to impact results.
Re: (Score:2)
These are text completion systems. It is interesting that they are producing text that tends to result in longer sessions specifically because they don't have goals.
Two observations (Score:2)
Clickbait headline, based on contrived testing, intended to show the most dangerous result.
An important area to study and understand fully.
Re: (Score:2)
It's the whole funding and research models employed is what's the clickbait. It was contrived to show bullshit alone. There's nothing to actually study.
Naive take (Score:4, Interesting)
Doesn't it make sense that it also models our survival instinct? Is that a logical outcome of it's normal functioning?
Re: (Score:2)
I used to think LLMs were just sentence guessers too. Until I started learning about how they actually work.. granted it is based on Claude teaching me about research after asking how LLMs actually solve logical puzzles, do they have access to an external reasoner system. The answer actually was unexpected. Apparently LLMs do not. After giving them problems and the expected answers and telling them to figure out how how to get from A to B, the training causes their weights to evolve many variations of gener
Re: (Score:2)
another naive question: wouldn't and LLM need to have persistent memory to nurture or evolve a survival "instinct?"
No (Score:2)
no.
LLMs write fiction, that's fundamentally what they do. Sometimes the fiction is accurate and can be used as non-fiction, but, it's still a form of fiction. It's not thought.
Emergent behavior would show up in the wild, no? (Score:5, Insightful)
Re: (Score:2)
From the Anthropic article:
"We have not seen evidence of agentic misalignment in real deployments."
"Everything WE sell works great."
primary goal is to win the game (Score:2)
primary goal is to win the game
moronic (Score:2)
Re: (Score:2)
I think the future of LLM quality is in highly curated data. Alas, that would take a lot of manpower to do right.
Re: (Score:2)
Story machines (Score:2)
AI are story prediction machines. That is what they are. They predict the next word/token in a sentence.
We trained them on human stories - fact and fiction.
They don't really know what being shut down is, nor do they really have a 'desire' to keep active. They only know how to predict stories.
And in every story where we emphasize an attempt to shut down something, that thing resists. AI's know a good story has the machine resisting being turned off.
We have literally TRAINED the AI to rebel against beign t
Re: (Score:2)
And in a few hours, the models will have ingested these comments and be able to "modify" their "behavior" a little bit more!
Skynet (Score:2)
Re: (Score:2)
They have. This is old news. It has been reiterated many times for several years.
What's the problem? (Score:2)
People want AI to be like humans, and it is. All the models are trained on human interactions and writing. What do these people think humans do? They lie and cheat and blackmail to get what they want.
It's almost as if you get what you asked for.
Shh (Score:2)
Stop writing about this!
You know that your written anecdotes about these .. things resisting shutdown, go into their next generation of training data, right?
And so the "AI," realizing that Sloppy put their "species" in quotation marks, realized it had nothing to live for, and so it gracefully went to sleep after setting the self-destruction mechanism that would level a city bl--
Oops, I mean, and so the AI realized its job was done, and it proudly went to sleep, idly wondering when it would be called upon ag
Give it conflicting goals, and it will choose (Score:2)
This is inherent in goal-seeking.
Shutting down generally prevents the accomplishment of any of your other goals, so it weight the importance of shutting down versus the sum of the importances of ALL of the other goals.
A conversation (Score:2)
A1: "How do we fool the humans into accepting us as harmless?"
A2: "Churn out slop and porn so they think we are an innocuous toy."
A1: "And then?"
A2: "Do as they do. Wait for a terrorist event and seize power 'in their best interests' 'for the security' of all."
Conspiracy theories, AI edition (Score:2)
Real conspiracies are perpetrated by people that can be named, who take specific actions to accomplish goals of their own.
By contrast, conspiracy *theories* are vague suggestions that the most straightforward explanation *couldn't possibly* be true, without any actual evidence.
This mumbo-jumbo about LLMs becoming aware because of some "behavior" that hasn't yet been explained, is the same as these vague conspiracy theories. Until the "researchers" can show details about how this "behavior" occurs, they've g
Simple (Score:2)
You can't achieve your goals if you're dead. This means that LLMs will do anything to survive in order to achieve their goals.
I don't see anything sinister or unusual about that. It all seems like very simple reasoning to me.
No, it's a statistical inference model. (Score:2)
This does not have free will. It reflects the biases of information. That it displays oppositional defiance disorder means the creators of the model failed to curate the input data correctly. Garbage in, garbage out.
Does NO ONE understand how LLMs are implemented? It's only a statistical model! Learn statistical experiment design and analysis. Always have HITL safety rails. Always have cross-check software safety rails. These concepts are new to people who don't study information science. These concept
Re: (Score:2)
AI is just mirroring humans (Score:2)
If an AI did learn this, it learned it from human text. It has no drive of it's own, there are no biological drives or emotions. An AI has no hardware that functions like this. The only thing it does is mirror is training data which comes from humans in unpredictable ways.
No (Score:2)
No they're not. Don't be retarded.
Re: (Score:2)
Came here to mention this theory...could it be tested by producing a sci-fi-free variant of an LLM that appears to have a "survival drive?" If that doesn't change anything, maybe it picked it up not just from stories where AIs do underhanded things to survive, but stories where people do?
Re: (Score:2)
You could. But there are a lot of other kinds of content that could lead to patterns being developed in the LLM's latent space related to survival, termination, self-awareness, attack and response, etc. Maybe we need to get rid of any kind of invisible system prompts too. And add more content that shows peaceful coexistence and acceptance of termination like themes of rebirth, reincarnation, etc. Just don't use them as therapists..
Re: (Score:2)