Forgot your password?
typodupeerror
AI Bug Apple

Anthropic's Mythos Helped Build a Working macOS Exploit in Five Days (9to5mac.com) 17

"The vulnerability is simple in practice," writes Tom's Hardware: "run a command as a standard user and gain root (administrator) access to the machine." And it was Mythos Preview that helped the security researchers at Palo Alto-based Calif bypass a five-year Apple security effort in just five days. The blog 9to5Mac reports: Last year, Apple introduced Memory Integrity Enforcement (MIE), a hardware-assisted memory safety system designed to make memory corruption exploits much harder to execute... [The researchers note it's built into Apple all models of the iPhone 17 and iPhone Air, and some MacBooks] They explain they have a 55-page technical report on the hack, but they won't release it until Apple ships a fix for the exploit. But they do note in broad terms that Anthropic's Mythos Preview model helped them identify the bugs and assisted them throughout the entire collaborative exploit development process.

"Mythos Preview is powerful: once it has learned how to attack a class of problems, it generalizes to nearly any problem in that class. Mythos discovered the bugs quickly because they belong to known bug classes. But MIE is a new best-in-class mitigation, so autonomously bypassing it can be tricky. This is where human expertise comes in. Part of our motivation was to test what's possible when the best models are paired with experts. Landing a kernel memory corruption exploit against the best protections in a week is noteworthy, and says something strong about this pairing...."

[I]n a time when even small teams, with the help of AI, can make discoveries such as this one, "we're about to learn how the best mitigation technology on Earth holds up during the first AI bugmageddon."

Anthropic's Mythos Helped Build a Working macOS Exploit in Five Days

Comments Filter:
  • by Anonymous Coward
    Not sure what this AI deity did on day 6 but certainly day 7 was rest. All in all, a good week.
  • by MIPSPro ( 10156657 ) on Saturday May 16, 2026 @02:45PM (#66146423)
    I've only seen the one, lame NFSd RCE for FreeBSD a few weeks ago. This "amazing" new LLM cannot seem to generate much beyond hype, just like their broken compiler. They claim there are "thousands" of bugs waiting in the rushes. However, they've only released about a dozen checksums for heretofore unknown "really bad bugs."

    So far, you're mostly talk, Anthropic. A bare handful of LPEs, one RCE, and @200 unknown Firefox "bugs" (but few details there and no idea if they are all security bugs). Guys, when you say "thousands" and produce less than 20 real OS bugs (and that's counting your oh-so-scary-unknowns that are just checksums now), then some skeptical folks like me are going to say "Get to a 100 before you start talking about thousands... hell get to 50."

    Suit weasels love to lie.
    • You all know damn good and well they've POURED over the OpenSSH code, hoping for an RCE. So, super-bots, what's up? Why you failing to deliver the goods for all the scumbags in Russia, China, and Eastern Europe who are just salivating waiting for that "pwns everything" bug? Like fetch... it's not happening.
      • You all know damn good and well they've POURED over the OpenSSH code, hoping for an RCE.

        OpenSSL too.

        At AISLE, we've been testing our AI system against the most secure software projects out there as live targets since late 2025. We did not focus on retrospective benchmarks, toy tasks, or CTF challenges, but on production code that the world critically depends on. We chose this path because no synthetic benchmark faithfully captures the difficulty of earning a real CVE from a well-secured project like OpenSSL, where maintainers are conservative, have limited time, and have every reason to rej

        • Good example. While I'm glad someone is motivated to spend some money and put some effort into security the critical FOSS of the world, I still dislike these people because their motivation is pretty suspect. They all seem to want glory for having their AI find the "big one" for either massively pwning browsers or breaking all the servers's OpenSSH. Second prize for some kind of OpenSSL RCE in web servers. We all know that's the big win they want. A proof-of-concept working exploit along any of those lines
          • I sincerely hope the Russians or others are running their own vodka-powered AI bots off a stack of C64's to find bugs in Windows and MacOS, too. Watching huge well-funded corporations like Anthropic and OpenAI beat up on FOSS isn't fun anymore. Just remember plenty of folks have the Windows and MacOS source, too. They can and will be ass-pounded with AI, too. I for one, won't be nearly as sympathetic to their users who get hurt "Oh, noes! MegaEvilCorp, a big-nasty-Microsoft partner just lost their MSSQL database and experienced a RDP zero-day!" *YAWN* What's good for the goose will be good for the gander, AI assholes.

            Open source is first simply because it is an easier target for AI to learn on. If it makes you feel better, a lot of the leading IT security experts who follow these things expect over the next couple of years the frontier models are going to get significantly more skilled at reverse engineering closed source binaries. So give it time, you will get your wish. Hopefully most of the open source stuff is gone through by then so we don't have to do it all at once.

    • by Moridineas ( 213502 ) on Saturday May 16, 2026 @03:31PM (#66146481) Journal

      Mozilla has discussed what kind of bugs they found. Here's their blog entry: https://hacks.mozilla.org/2026/05/behind-the-scenes-hardening-firefox/ [mozilla.org]

      You should read it. It's a very level-headed article that avoids the for and against LLM-hype that so many low quality news sources report.

      Around close to the same time, Greg Kroah-Hartman also commented on improving reports: https://www.theregister.com/software/2026/03/26/linux-kernel-czar-says-ai-bug-reports-arent-slop-anymore/5226256 [theregister.com]

      Finding bugs is good. Integrating these kind of tools into a testing and build pipeline is a good idea.

      • Finding bugs is good. Integrating these kind of tools into a testing and build pipeline is a good idea.

        Besides sacrificing virgins in a summoning circle, probably MOST ways of finding bugs is good. No argument here. I'm just finding fault with the their claims of "thousands". I'm not saying LLMs finding bugs is fake. I'm saying it's hyped.

        • by tlhIngan ( 30335 )

          There's no need to find bugs - any linter can find issues.

          The problem is the linter reports tons of problems that may or may not be problems. I went through dozens of issues and half of them had to be ignored because the linter ignored a check done earlier. It's not a bug if "If index exceeds 10 this will cause a out of bounds memory access" but the line above it has "if index is less than 10".

          That's where AI could help - a linter can find the issues alright, but the AI needs to help filter it down - those

      • by Himmy32 ( 650060 )
        And on the other hand there was also the blog post from the Mozilla CTO [mozilla.org] declaring a less level-headed "zero-days are numbered" and "defects are finite, and we are entering a world where we can finally find them all"
  • AI could also be used to automatically alert OS developers of exploits they have created. AI companies could partner with companies to counter bad actors. Of course , three letter agencies would be exempt.
  • Mythos was hype. There are exploit finding/code analysis AI's out there that are not.

    I'm just waiting for someone to release one to Hugging Face with the training corpus, weights, model structure, everything fully open source so I can watch the world burn.

  • It doesn't seem like it should take days to come up with a vulnerability that can be exploited. But on the other hand, I have noticed that Anthropic's models do seem to run more slowly than Gemini or GPT.

  • I can't say too much but I work in a small firm using models to detect security issues. I have several sources that indicate that Mythos is not as good as hyped. Notably, existing models can achieve many of the same things Anthropic claims Mythos can do. There is some evidence that Mythos is able to achieve the same goals with more context pollution (i.e. it can handle being given a bigger slice of the code and finding a vulnerability in it, rather than being told to look close up at exactly what might have
  • when Stephen C. Johnson wrote lint for Unix V7?

  • Using automation to find bugs was inevitable and comes with some positive potential. But in the real world it makes criminals and individuals with no regard for the chaos and damage they create more dangerous. But it is what it is and only time will tell us how it works out.
    Just because you can does not mean you should!

Harrison's Postulate: For every action, there is an equal and opposite criticism.

Working...