Anthropic's Mythos Helped Build a Working macOS Exploit in Five Days (9to5mac.com) 17
"The vulnerability is simple in practice," writes Tom's Hardware: "run a command as a standard user and gain root (administrator) access to the machine."
And it was Mythos Preview that helped the security researchers at Palo Alto-based Calif bypass a five-year Apple security effort in just five days. The blog 9to5Mac reports:
Last year, Apple introduced Memory Integrity Enforcement (MIE), a hardware-assisted memory safety system designed to make memory corruption exploits much harder to execute... [The researchers note it's built into Apple all models of the iPhone 17 and iPhone Air, and some MacBooks] They explain they have a 55-page technical report on the hack, but they won't release it until Apple ships a fix for the exploit. But they do note in broad terms that Anthropic's Mythos Preview model helped them identify the bugs and assisted them throughout the entire collaborative exploit development process.
"Mythos Preview is powerful: once it has learned how to attack a class of problems, it generalizes to nearly any problem in that class. Mythos discovered the bugs quickly because they belong to known bug classes. But MIE is a new best-in-class mitigation, so autonomously bypassing it can be tricky. This is where human expertise comes in. Part of our motivation was to test what's possible when the best models are paired with experts. Landing a kernel memory corruption exploit against the best protections in a week is noteworthy, and says something strong about this pairing...."
[I]n a time when even small teams, with the help of AI, can make discoveries such as this one, "we're about to learn how the best mitigation technology on Earth holds up during the first AI bugmageddon."
"Mythos Preview is powerful: once it has learned how to attack a class of problems, it generalizes to nearly any problem in that class. Mythos discovered the bugs quickly because they belong to known bug classes. But MIE is a new best-in-class mitigation, so autonomously bypassing it can be tricky. This is where human expertise comes in. Part of our motivation was to test what's possible when the best models are paired with experts. Landing a kernel memory corruption exploit against the best protections in a week is noteworthy, and says something strong about this pairing...."
[I]n a time when even small teams, with the help of AI, can make discoveries such as this one, "we're about to learn how the best mitigation technology on Earth holds up during the first AI bugmageddon."
Day 6 and 7? (Score:1)
Another LPE... YAWN. Wake me for RCEs (Score:3)
So far, you're mostly talk, Anthropic. A bare handful of LPEs, one RCE, and @200 unknown Firefox "bugs" (but few details there and no idea if they are all security bugs). Guys, when you say "thousands" and produce less than 20 real OS bugs (and that's counting your oh-so-scary-unknowns that are just checksums now), then some skeptical folks like me are going to say "Get to a 100 before you start talking about thousands... hell get to 50."
Suit weasels love to lie.
One other thing. On OpenSSH (Score:3)
Re: (Score:2)
You all know damn good and well they've POURED over the OpenSSH code, hoping for an RCE.
OpenSSL too.
At AISLE, we've been testing our AI system against the most secure software projects out there as live targets since late 2025. We did not focus on retrospective benchmarks, toy tasks, or CTF challenges, but on production code that the world critically depends on. We chose this path because no synthetic benchmark faithfully captures the difficulty of earning a real CVE from a well-secured project like OpenSSL, where maintainers are conservative, have limited time, and have every reason to rej
Re: (Score:2)
Re: (Score:2)
I sincerely hope the Russians or others are running their own vodka-powered AI bots off a stack of C64's to find bugs in Windows and MacOS, too. Watching huge well-funded corporations like Anthropic and OpenAI beat up on FOSS isn't fun anymore. Just remember plenty of folks have the Windows and MacOS source, too. They can and will be ass-pounded with AI, too. I for one, won't be nearly as sympathetic to their users who get hurt "Oh, noes! MegaEvilCorp, a big-nasty-Microsoft partner just lost their MSSQL database and experienced a RDP zero-day!" *YAWN* What's good for the goose will be good for the gander, AI assholes.
Open source is first simply because it is an easier target for AI to learn on. If it makes you feel better, a lot of the leading IT security experts who follow these things expect over the next couple of years the frontier models are going to get significantly more skilled at reverse engineering closed source binaries. So give it time, you will get your wish. Hopefully most of the open source stuff is gone through by then so we don't have to do it all at once.
Re:Another LPE... YAWN. Wake me for RCEs (Score:5, Interesting)
Mozilla has discussed what kind of bugs they found. Here's their blog entry: https://hacks.mozilla.org/2026/05/behind-the-scenes-hardening-firefox/ [mozilla.org]
You should read it. It's a very level-headed article that avoids the for and against LLM-hype that so many low quality news sources report.
Around close to the same time, Greg Kroah-Hartman also commented on improving reports: https://www.theregister.com/software/2026/03/26/linux-kernel-czar-says-ai-bug-reports-arent-slop-anymore/5226256 [theregister.com]
Finding bugs is good. Integrating these kind of tools into a testing and build pipeline is a good idea.
Re: (Score:2)
Finding bugs is good. Integrating these kind of tools into a testing and build pipeline is a good idea.
Besides sacrificing virgins in a summoning circle, probably MOST ways of finding bugs is good. No argument here. I'm just finding fault with the their claims of "thousands". I'm not saying LLMs finding bugs is fake. I'm saying it's hyped.
Re: (Score:2)
There's no need to find bugs - any linter can find issues.
The problem is the linter reports tons of problems that may or may not be problems. I went through dozens of issues and half of them had to be ignored because the linter ignored a check done earlier. It's not a bug if "If index exceeds 10 this will cause a out of bounds memory access" but the line above it has "if index is less than 10".
That's where AI could help - a linter can find the issues alright, but the AI needs to help filter it down - those
Re: (Score:2)
Re: (Score:2)
Maybe the moral of the story should be, don't listen to the C suite OR the tech media!
Adapt AI to be a disclosure tool (Score:2)
I am aware of an AI that actually can do more (Score:2)
Mythos was hype. There are exploit finding/code analysis AI's out there that are not.
I'm just waiting for someone to release one to Hugging Face with the training corpus, weights, model structure, everything fully open source so I can watch the world burn.
Seems kind of slow (Score:2)
It doesn't seem like it should take days to come up with a vulnerability that can be exploited. But on the other hand, I have noticed that Anthropic's models do seem to run more slowly than Gemini or GPT.
It's hype (Score:2)
Did they get this excited in 1978 (Score:2)
when Stephen C. Johnson wrote lint for Unix V7?
Making criminals & script kiddies more dangero (Score:2)
Just because you can does not mean you should!