Forgot your password?
typodupeerror
AI Security

GPT-5.5 Matches Heavily Hyped Mythos Preview In New Cybersecurity Tests (arstechnica.com) 26

An anonymous reader quotes a report from Ars Technica: Last month, Anthropic made a big deal about the supposedly outsize cybersecurity threat represented by its Mythos Preview model, leading the company to restrict the initial release to "critical industry partners." But new research from the UK's AI Security Institute (AISI) suggests that OpenAI's GPT-5.5, which launched publicly last week, reached "a similar level of performance on our cyber evaluations" as Mythos Preview, which the group evaluated last month.

Since 2023, the AISI has run a variety of frontier AI models through 95 different Capture the Flag challenges designed to test capabilities on cybersecurity tasks, such as reverse engineering, web exploitation, and cryptography. On the highest-level "Expert" tasks, GPT-5.5 passed an average of 71.4 percent, slightly higher than the 68.6 percent achieved by Mythos Preview (though within the margin of error). In one particularly difficult task that involved building a disassembler to decode a Rust binary, AISI notes that "GPT-5.5 solved the challenge in 10 minutes and 22 seconds with no human assistance at a cost of $1.73" in API calls.

GPT-5.5 also matched Mythos Preview in its progress on "The Last Ones" (TLO), an AISI test range set up to simulate a 32-step data extraction attack on a corporate network. GPT-5.5 succeeded in 3 of 10 attempts on TLO, compared to 2 of 10 for Mythos Preview -- no previous model had ever succeeded at the test even once. But GPT-5.5 still fails at AISI's more difficult "Cooling Tower" simulation of an attempted disruption of the control software for a power plant, as every previously tested AI model also has. The new results for GPT-5.5 suggest that, when it comes to cybersecurity risk, Mythos Preview was likely not "a breakthrough specific to one model" but rather "a byproduct of more general improvements in long-horizon autonomy, reasoning, and coding," AISI writes.

GPT-5.5 Matches Heavily Hyped Mythos Preview In New Cybersecurity Tests

Comments Filter:
  • by memory_register ( 6248354 ) on Friday May 01, 2026 @02:08PM (#66122750)

    The summary states that "AISI notes that "GPT-5.5 solved the challenge in 10 minutes and 22 seconds with no human assistance at a cost of $1.73" in API calls." However, there is really good evidence that users only pay 5-10% of the actual cost; the rest is subsidized by VC dollars. What happens when those subsidies go away? https://www.wheresyoured.at/th... [wheresyoured.at]

    • by Tailhook ( 98486 )

      What happens when those subsidies go away?

      Who cares. This stuff is still in its infancy, and new algorithms and new hardware is going to collapse all of this to commodity level value anyhow. A few years from now you'll buy a GTP-5.x/Mythos equivalent in a box for gaming console money.

      • by gweihir ( 88907 )

        This stuff is still in its infancy,

        Actually, it is not. It is old tech blown up. There are no easy wins to be had and believing they will come is believing in magic.

        • by caseih ( 160668 )

          Just because big data, neural nets, and pattern recognition has been around for decades doesn't mean this stuff is not still in its infancy. In fact it very much is. The transformer architecture described by Google in 2017 was very much a breakthrough that turned decades-old stagnant ideas into something incredibly useful. We're not even 10 years on from that! Just a baby still. And only in the last few years has massively parallel computing power (GPUs etc) gotten to the point of allowing transformers

          • by gweihir ( 88907 )

            Transformers are an implementation optimization that comes with restrictions on the functional side, not something fundamentally new. It retains the drawbacks of the previous approaches. And it loses the universality that regular artificial neural networks have. (At least that is what the DDG artificial idiot claims.)

            So, faster but even less capable than what was known before and fundamentally so. Why you expect any easy wins here is beyond me.

            • Darlings, it is all a bit of math. It is centuries old.
              • Oh, here is another example. Take the electric car for example. The first cars actually were electric. In the 1800s there were plenty of electric cars. These new ones? Just old tech. Battery? Same time period. My god, what took them so long... ;-)
              • by gweihir ( 88907 )

                That statement is as ignorant as it is arrogant. And it contributes nothing. Well done...

        • by Tailhook ( 98486 )

          There are no easy wins

          It's all easy wins. Winning a war is hard. This is just chips, software and data.

    • Nothing. $103/hr for a superhuman employee or $10.30/hr for a superhuman employee.

      If it's boosting employee efficiency by 50% as claimed in another Slashdot story above then assuming your Sr. Engineer makes $250,000 a year / 48 weeks / 8hr days = $650/day. + 50% for AI means you're getting an extra $325/day in work from the employee.

      They could run 10x costs for 3 hours a day and break-even. But the average is currently way less than 3hr/day. Claude claims the average developer consumes $13/day in tokens.

    • by Tablizer ( 95088 )

      users only pay 5-10% of the actual cost; the rest is subsidized by VC dollars. What happens when those subsidies go away?

      This is the Great Mystery of AI in general that makes it smell so bubbly. It's not just VC money, but market-share fights between big-tech companies that encourages more subsidization.

      The gravy train has to dry up eventually, and if users cut way back under actual prices, the entire industry will get the dreaded bubble shock.

  • by CEC-P ( 10248912 ) on Friday May 01, 2026 @02:21PM (#66122780)
    OH GOOD, that's what we needed Sam Altman's crazy ass to have access to. Not solely because he's a sociopath and I don't trust him, but also because they can actually monetize this thing by selling security analysis to giant software vendors. At least he'd resist giving it to the US government, in theory.
  • That's nothing, I can do all of the above with just a teaspoon and a length of string.

  • In the "Cooling Tower" test, is it known that there is a solution?

  • ...Expect hype like this and more.
    I remember when some claimed ChatGPT v2 was too dangerous to release.
    OpenAI plays the game by dropping hints, using codenames, and trying to build excitement and anticipation.
    I don't understand why Anthropic does what they do. Some of their statements are pure doomer nonsense, yet their tech is genuinely useful.
    Meanwhile, DeepMind quietly works in their lab.
    I like DeepMind.

  • I think it's pretty clear that Anthropic just wasn't ready to release Mythos.

    They'd signaled that they would, so they needed an excuse to get them out of the corner they'd backed into. "It's too dangerous to release" was their excuse, but it was just a smoke screen.

    • There is only one problem with your "pretty clear" claim. They released it. Just because they didn't release it to you doesn't mean they didn't release it.
  • Claude wanted to give the impression it was "so good that its going to break internet security." But now with GPT 5.5 you get a comparable full model released with zero problems.
    • First off, I fully agree that Anthropic tried to spin a negative (they weren't ready to release the new model they'd promised) into a positive ("it's just too damn good to release"). I said as much above.

      However, I think "you get a comparable full model released with zero problems." ignores some major differences between the two companies. Despite their chicanery, I still trust Anthropic to behave responsibly FAR, FAR more than I trust anything Open AI or its C-suite says. Just because Open AI says "our

    • The first problem with your argument is that you are assuming that OpenAI released responsibly. Who says what they did isn't dangerous? The second is that Anthropic gave access to major players a while back, so corporations have had time to fix zero days before ChatGPT was released. Third, even if it is true that it isn't dangerous to make models with this much power available now, that doesn't prove that Anthropic didn't have a good faith concern that it might be. Finally, a basic understanding of the sec
      • Black hats are a danger no matter what, it doesn't matter what tool you give them. They were saying the same thing about GPT 2. It's all just to hype it up, most security risks these days end up being human.
    • by gweihir ( 88907 )

      Obviously it was hype, or rather direct lying.

  • WoW. Your AI can match their AI on a task that NOONE wants done. The previous AI results were tossed because the bugs they reported were 80% typos and formatting errors that everyone else agreed were to be left for student training. Thanks AI...
    What we need is for everyone's AI to code a personal OS based on a standard API, but with totally different spaghetti back ends. Then recode on a monthly basis.
    #hackthat

  • Yup, all models can do this. If they have reasoning, they can crack. In fact many models without reasoning can crack easily enough.

The earth is like a tiny grain of sand, only much, much heavier.

Working...