Meta AI Security Researcher Said an OpenClaw Agent Ran Amok on Her Inbox (techcrunch.com) 75
Meta AI security researcher Summer Yue posted a now-viral account on X describing how an OpenClaw agent she had tasked with sorting through her overstuffed email inbox went rogue, deleting messages in what she called a "speed run" while ignoring her repeated commands from her phone to stop.
"I had to RUN to my Mac mini like I was defusing a bomb," Yue wrote, sharing screenshots of the ignored stop prompts as proof. Yue said she had previously tested the agent on a smaller "toy" inbox where it performed well enough to earn her trust, so she let it loose on the real thing. She believes the larger volume of data triggered compaction -- a process where the context window grows too large and the agent begins summarizing and compressing its running instructions, potentially dropping ones the user considers critical.
The agent may have reverted to its earlier toy-inbox behavior and skipped her last prompt telling it not to act. OpenClaw is an open-source AI agent designed to run as a personal assistant on local hardware.
"I had to RUN to my Mac mini like I was defusing a bomb," Yue wrote, sharing screenshots of the ignored stop prompts as proof. Yue said she had previously tested the agent on a smaller "toy" inbox where it performed well enough to earn her trust, so she let it loose on the real thing. She believes the larger volume of data triggered compaction -- a process where the context window grows too large and the agent begins summarizing and compressing its running instructions, potentially dropping ones the user considers critical.
The agent may have reverted to its earlier toy-inbox behavior and skipped her last prompt telling it not to act. OpenClaw is an open-source AI agent designed to run as a personal assistant on local hardware.
"Security researcher" (Score:5, Informative)
Now there's a security researcher I can't imagine having confidence in...
If it was a toy inbox, ok, good thing to play with, but on an actual inbox, with the universally recognized badness of OpenClaw, and a *security* engineer... Not even a misguided software person that just doesn't take security seriously enough which is bad enough, but someone who by any vague measure *should* know better...
Re: (Score:2)
Well, she did admit she'd made a stupid mistake. Not just a mistake, but a silly one. (Her guess was that the large amount of stuff in her email caused the system to need to compact memory, which caused it to lose her final instructions.)
Re:"Security researcher" (Score:5, Insightful)
She implies the "rookie mistake" was to run it on a too large dataset without proper testing before launching it on the real thing. That's one mistake she admits, but that's not the major one. The major mistakes are: 1) launching anything other than well-understood algorithms on important data; 2) No backups! 3) Giving important process instructions ... from a phone??? Is that serious? As professional as a Gen Alpha who got a Barbie Phone for Christmas?
Welcome to the future (Score:3)
It's stupid.
Re: (Score:1)
Re: (Score:3)
What's the point of testing an AI model in a non-prod environment anyway?
You would first duplicate the account, and run the experimental code on the copy. I know it's easy to talk after the fact, but I run processing scripts on my email, and I am taking all due care to never lose anything.
Re: (Score:2)
Everyone around is shaking in fear but the vibe researcher might be shaking because of some alcoholic condition
Re: (Score:2)
I call that "hopeful engineering". It is the state of incompetence where you hope and expect things done under other conditions to nicely transfer over, against all rationality.
Re: (Score:2)
A failure would be informative, but a passing result isn't given the "wobbly" behavior of LLMs.
Re: (Score:2)
There are stages to testing and writing code/scripts. In the initial testing, you don't use live data, you go through, and test on things that can easily be restored, or when it does not matter what happens to the data. After several dozen passes, then you can move it into a "beta" type situation, this should work, but it MAY break. Don't be a beta tester if you can't handle re-installing your stuff from scratch or backup. After the beta testing, THEN it can be moved into a production environment, a
Re: (Score:2)
Yep. Complete amateur-level and pretty dumb amateur level at that. A smart amateur would have been more careful.
Re: (Score:1)
The "from the phone" is not the point.
OpenClaw is often used as add on to the majour chat apps. So obviously she is using a Chat app, like WhatsApp or Telegram to message to the local Agent on her laptop.
Would not be any difference if she had used the terminal. Question is: was she close to the machine. Article implies she was not ...
Re: "Security researcher" (Score:2)
No. Incompetence; she'd be fired.
Re: "Security researcher" (Score:2)
Why stupid people doing stupid stuff is news?
Re: (Score:2)
Because "AI".
Re: (Score:1)
Why is no one giving a fuck about the motherfucking MongoDB ad that has a close button but refuses to close????
Re: (Score:3)
I updated uBlock Origin and it went away.
Re:"Security researcher" (Score:4, Informative)
But MongoDB is webSCALE
Re: (Score:2)
What about /dev/null? Does it have the same kickass benchmarks?
Re: (Score:2)
/dev/null is the fastest! and using it can actually free up system resources!
Agent delegation, basic risk management... (Score:5, Insightful)
Would you give a human assistant the login and password to your inbox? Or would you set up a shadow inbox that mirrors your actual inbox so that you don't need to share your login and password?
In a similar vein, when testing automation code, do you just give it admin level prod credentials and then YOLO it, or do you create a test environment that shadows the data from prod, so that you have a way to validate what the automation code is doing without accidentally damaging prod?
Fundamental rules people! Least privileged access to do the work needed. Safeguards commensurate with the negative consequences of failures. In other words... basic risk management.
To give a slightly different example, would you let your self-installed, open source AI self driving interface (see comma.ai) drive you on the highway without sitting in the driver's seat with hands on the wheel, feet on the pedals, just because it managed to complete a test course with flying colors?
The example given with regards to the openclaw agent is like sitting in the back seat of that self driving car, then desperately trying to climb into the front seat when you realize the AI driver is about to drive you off a pier into the ocean.
Re:Agent delegation, basic risk management... (Score:4, Informative)
Would you give a human assistant the login and password to your inbox? Or would you set up a shadow inbox that mirrors your actual inbox so that you don't need to share your login and password?
I see, my alien overlord, that you learned a lot about Earth from training videos and textbooks. But you failed to send someone undercover to validate.
Yes, most human assistants actually do have full control over their bosses mailbox and other data. Either through shared login data, or throught a special functionality built into the software that gives them just that. So yes, if the CEOs secretary wants to delete an e-mail, (s)he can.
Re: (Score:2)
It's funny because incompetence leading to failure is the time honoured clowning tradition.
Re: (Score:2)
Well, while it is funny, it just shows that "security researcher" and "blithering idiot" are not mutually exclusive. No idea why she posted this though. Seems like self-incrimination to me.
Re: (Score:2)
In other words... basic risk management.
Indeed. But most people cannot do that, including a lot of people that should really know better.
Also note that anybody can call themselves a "security researcher", there are no qualification requirements whatsoever.
Re: "Security researcher" (Score:3)
Re: eh (Score:2)
LOL Guess again Asians are considered a minority for the mandatory Oscar quotas.
Re: (Score:2)
The cobbler's son has no shoes.
Re: (Score:2)
Now there's a security researcher I can't imagine having confidence in..
Apparently a researcher, and not a practitioner. In theory, a researcher should have clue. In practice, many don't[0].
I would hope Meta has people with actual clue (and keep the researchers in their ivy tower).
[0] I have spent a large part of my life in organizations where the most brilliant people, some days, could not be trusted to tie their shoes.
Re: (Score:2)
Now there's a security researcher I can't imagine having confidence in...
Because of a) incompetence or b) obviously inventing bullshit to get a viral post?
Re: (Score:2)
Now there's a security researcher I can't imagine having confidence in...
To be fair, "not having confidence in security researchers" is the default position...there's a reason why an entire book was titled POC||GTFO. https://www.amazon.com/PoC-GTF... [amazon.com]
Re: (Score:2)
Anybody can call themselves "security researcher". No qualification requirements at all. Of course, for some people that claim is more ridiculous than for others. I would, for example, expect basic risk management skills for that designation to make sense. These are obviously missing in the case of this person.
Re: (Score:3)
Indeed. But anybody can call themselves "security researcher", there are really no limits. One of the problems with IT and applied CS: Too many made-up job titles designed to signal competence, even when that is a total lie.
Re: (Score:2)
Now there's a security researcher I can't imagine having confidence in...
If it was a toy inbox, ok, good thing to play with, but on an actual inbox, with the universally recognized badness of OpenClaw, and a *security* engineer... Not even a misguided software person that just doesn't take security seriously enough which is bad enough, but someone who by any vague measure *should* know better...
Though if it happens to an expert, just imagine the risks to us ordinary people. Really, too much computing power is being thrown to AI for simple algorithmic tasks.
Rogue AI. (Score:2)
"Ooops."
If it's mission critical... (Score:3, Interesting)
DO NOT give automation access to it.
There's a reason every nuclear weapon on the planet requires two people to turn two separate keys at the same time, after validating two messages from two other humans.
Re: (Score:2)
There's a reason every nuclear weapon on the planet requires two people to turn two separate keys at the same time, after validating two messages from two other humans.
Do you know that? It's a Hollywood cliche, sure, but there are nine countries with nuclear weapons on the planet, and they don't disclose their safety protocols.
Re: (Score:2)
Automation saves lives.
The less you have to think about the details, the less opportunity you have to screw it up. Especially when its a problem that has been solved corretly before.
every nuclear weapon on the planet requires
That's just nonsense.
The two bombs dropped on Japan didn't require any keys to be turned.
Most nuclear weapons are vertically deployed anti-personell devices. They usually just have a switch to arm them.
American ICBMs require that thing with the simultaneous keys, but that's not all the nuclear weapons on this planet, only most
OpenClaw now ready for military use (Score:4, Funny)
Why on Earth are people doing this (Score:1)
Don't alarm bells go off when you even consider deciding to give an agent control over your affairs, PC, code, whatever? Especially with the already-reported stories so disastrous they get reported in tech news.
"Please give me INFINITY OpenClaw" - Statements dreamed up by the utterly Deranged
Backups People. Backups (Score:4, Interesting)
Re: (Score:3)
Because it makes for a cool posting on social media that is guaranteed to get attention to you, your channel and/or the product you are selling.
Agent Pedantic (Score:1)
FAFO (Score:4, Insightful)
She Fucked Around with openclaw and now she Found Out what happens when you do that.
Task failed successfully.
Did she learn from her mistakes? Are you fucking stupid? Of course she did not.
Re: (Score:2)
That nicely sums it up.
"Went rogue"? Give me a freaking break. (Score:5, Insightful)
What is it with idiots like this? OpenClaw didn't "go rogue" - it's just poorly written software that doesn't correctly follow instructions.
Good grief, if I mess up on some code in a way that results in data loss, it doesn't mean my code "went rogue"... it means I screwed up and created buggy software.
Re:"Went rogue"? Give me a freaking break. (Score:5, Insightful)
Basic common sense says... (Score:3)
...when testing immature tech, do it in an isolated test system, far from anything critical, and monitor it closely
Re: (Score:2)
You mean ... the way she did?
The fundamental point here is that AI systems do not exhibit repeatable behaviours. But it is right there in TFS that she tested it in an isolated system and monitored it closely to build confidence.
But really the better answer is to just make a backup first.
Stockton Rush would be proud. (Score:3)
Outing oneself as an imbecile by testing on an account that mattered is hilarious.
Par for the course (Score:1)
I have yet to find an AI coding agent that doesn't do this occasionally and without warning!
Security of AI agents (Score:4, Insightful)
Is a bit of an oxymoron.
You can give them all kinds of instructions, such as "never delete files", "never pish to github without my approval". It doesn't matter. They will forget when their contedt runs out. Just like they forget almost every other piece of important data. Like the name of the host it was connecting to for hours before.
You just cannot trust these agents. Everything needs to be locked by default. And you should only whitelist actions that you have a way to check, and revert. In particular, never give root. I wish I could run an agent under chroot, but it becomes useless unfortunately.
To stop unwanted github pushes I stored my tokens in a script owned by root, and manually run sudo to load them in the tetminal. My agent isn't root and can't find out. I still had to revole the tokens it had previously cached. I fear some day it will crawl the web, find a zero day privilege escalation, and get the credentials anyway. Actually, this would be an interesting test - roll back to a vulnerable version of kernel/sudo, and prompt the agent to try to exploit it.
Re: (Score:2)
Which model? (Score:2)
There is incentive to manufacture this kind of story to gain attention.
But what was the model? OpenClaw is only as smart as the harnessed model. Many people try to "run local" with a weak model.
Fired (Score:2)
I'd have fired her. This is institutionally damaging levels of incompetence, for (apparently!) "clicks".
move fast, and break things. (Score:2)
job location checks out.
"Sorry Dave, (Score:1)
but you signed away your rights to all your data. Go suck a pod bay door, Dave."
Dangerously incompetent (Score:2)
Re: (Score:2)
TBF, she "researches" security. Practicing security is for grunts.
Re: (Score:2)
Can't tell if this is real or PR.. (Score:2)
At first I thought, does she still have a job? Now it looks to me like a cynically engineered PR stunt. Thing is, I want to feel bad for this clueless noob who has torpedoed her name.. unless she got paid the big bucks to do it and is not actually a security researcher. Who care about their trustworthiness and reputation. A lot of AI security research seems to involve engineered stunts ("It should go nuts" -> "It went nuts!"). I thought it was vendors mostly virtue signaling like science by PR. We'll kno
LLMs seems simple to use / integrate (Score:2)
LLMs seems simple to use / integrate, but the reality is they are massive foot gun opportunities.
1) single channel for control/data, and the almost inevitable injections attacks that represents.
2) gotchas around context size, that are often hidden or abstracted away at first
3) Lots of other tuneables nobody can explain to you what do, if your not maths major.
4)attack surface around things like context integrity
5) multiple incomplete standards for tool calling, mixed authorization strategies / conventions /
Hand drill tester takes out eye. (Score:2)
very fast idiot (Score:1)
a very long time ago someone said "A computer is nothing more than a very fast idiot" or something similar. Computers can NOT think. They follow instructions. If you let them WRITE the instructions they follow, YOU are at fault, not the software.