Forgot your password?
typodupeerror
Security IT

AI Hackers Are Coming Dangerously Close to Beating Humans (msn.com) 30

Stanford researchers spent much of the past year building an AI bot called Artemis that scans networks for software vulnerabilities, and when they pitted it against ten professional penetration testers on the university's own engineering network, the bot outperformed nine of them. The experiment offers a window into how rapidly AI hacking tools have improved after years of underwhelming performance.

"We thought it would probably be below average," said Justin Lin, a Stanford cybersecurity researcher. Artemis found bugs at a fraction of human cost -- just under $60 per hour compared to the $2,000 to $2,500 per day that professional pen testers typically charge. But its performance wasn't flawless. About 18% of its bug reports were false positives, and it completely missed an obvious vulnerability on a webpage that most human testers caught. In one case, Artemis found a bug on an outdated page that didn't render in standard browsers; it used a command-line tool called Curl instead of Chrome or Firefox.

Dan Boneh, a Stanford computer science professor who advised the researchers, noted that vast amounts of software shipped without being vetted by LLMs could now be at risk. "We're in this moment of time where many actors can increase their productivity to find bugs at an extreme scale," said Jacob Klein, head of threat intelligence at Anthropic.
This discussion has been archived. No new comments can be posted.

AI Hackers Are Coming Dangerously Close to Beating Humans

Comments Filter:
  • by SlashbotAgent ( 6477336 ) on Thursday December 11, 2025 @12:19PM (#65851035)

    Yet again. AI improves the world.

    Bonus: Modern script kiddies will be more powerful than ever.

    • Script kiddies won't ever find the good vulns now. APT groups are already using these systems.
      • Script kiddies won't ever find the good vulns now. APT groups are already using these systems.

        What do you mean by "now"? Script kiddies -- by definition -- never found ANY vulnerabilities, let alone "good" one's.

        • Anyone else remember page-widening Klerck, the famous troll driven to suicide by toxic slashdot mods?

          • I don't have any idea of who that is. Probably because I use Alexander Peter Kowalski's HOSTS file engine across all of my devices which blocks "troll" posts at the OS level before my browser has a chance to render them.
    • Iâ(TM)d be curious to see how well true black hats are faring using ai tools.

      Relevant Star Trek, A Taste of Armageddon.

  • by liqu1d ( 4349325 ) on Thursday December 11, 2025 @12:40PM (#65851119)
    They had to give it hints as to what to find. I don't see an example of the hints though. While interesting I would love to see more information than presented.
  • by AmazingRuss ( 555076 ) on Thursday December 11, 2025 @12:43PM (#65851131)
    "We're in this moment of time where many actors can increase their productivity to find bugs at an extreme scale," ... so subscribe now!
  • One in ten (Score:4, Interesting)

    by alleycat0 ( 232486 ) on Thursday December 11, 2025 @12:52PM (#65851169) Homepage
    If I was that one guy that beat the AI, I'd be asking for a raise right now.
  • How can AI simultaneously be a hallucination machine with no real-world credibility, as proven by math, and also a threat to the security of the systems red-star slashdot commentator gweihir boasts he's in charge of?

    • by HiThere ( 15173 )

      I'll assume you are being serious.
      1. Not all AIs are equivalent to ChatGPT.
      2, Mistaking something that isn't a vulnerability for a vulnerability is relatively low cost.
      3. Finding one vulnerability that's real can be extremely important.

      NOTE: It doesn't NEED to be perfect. If it's "good enough" then it's good enough to be useful. Things that aren't vulnerabilities are relatively cheap to check.

      P.S.: You shouldn't have needed this explanation.

  • by david.emery ( 127135 ) on Thursday December 11, 2025 @01:01PM (#65851211)

    It makes sense to me that a relatively narrowly focused/narrowly trained AI system eventually beats humans for vulnerability detection. And significant false positives are in this scenario probably acceptable, much more than false negatives. Someone will have to work off those false positives (and presumably feed them back so the tool learns and gets better.)

    But at the end of the day, the question is "How do you trust the tool is correct?" Here, at least, you can write a reasonably testable requirement. "Must detect security vulnerabilities" and provide a definition of (which I'm probably not qualified to write :-) ) for 'security vulnerability'. But then someone has to figure out what the verification approach will be, and how that's established/documented. Should there be a formal registry of 'trusted AI vulnerability scanners"? Certainly if we expect such tools to be used for product qualification ("Your website must be shown to contain no vulnerabilities, as inspected by this tool and set of procedures we trust."), we have to have a way to establish that trust.

    This is a good news story, but there's much more work to be done to turn this into production. And a lot of that work is not strictly technical, but managerial (probably including government participation, e.g. a NIST set of qualification criteria and maybe even a registry of tools that meet those criteria.)

  • Artemis found a bug on an outdated page that didn't render in standard browsers

    This seems to suggest that humans were using the browser environment by hand and didn't even consider the source or using some traditional or ai tool to help audit the code. I don't know much about pen testing but I have a hard time believing that is how it's done. Maybe it wasn't how good the AI was but how incompetent the human testers were.

    • by Anonymous Coward

      I would expect a guy named "Bob Butts" to know plenty about penetration testing.

  • This is a real problem. Imagine an AI computer virus swarm, with "Brain Bug" leader AIs building and releasing swarms of tailor-made virii to achieve certain hacking goals at a pace no human team of network admins can keep track of.

    Hard cryptographic human-controlled Ident/Auth/Auth, encryption and signage is very quickly going to become a real necessity.

  • by Locke2005 ( 849178 ) on Thursday December 11, 2025 @02:59PM (#65851631)
    The only defense against a bad AI is a good AI. There has always been an arms race between the hackers and the security consultants; AI just accelerates the pace. Ultimately, we will have to rely on AI to defend us from AI. Better get to training those paranoia AIs, boys... Aren't we already at the point that we need to use an AI to detect AI-generated content?
  • This is a plot device I penned in my novel 4 years ago. Fuck.
  • I was ready to say "yay, weâ(TM)ve finally reached the point where AIs can find code and network vulnerabilities and we can patch them and stop worrying so much about exploits." But then I realized, on the other hand, it might just be a battle of AI vs AI now: can the white hat AIs find the exploits faster than the black hat AIs? I wonder which will win out?

    Put another way, maybe there's an infinite supply of exploits (especially with vibed code) and the bad AIs may be able to find them faster than the

  • "Artemis found bugs at a fraction of human cost -- just under $60 per hour compared to the $2,000 to $2,500 per day that professional pen testers typically charge."

    What this means is that you could get an AI to scan your app and find vulnerabilities for you for relatively little money. This is a good thing. And then probably you could feed the results into your coding AI and get that stuff fixed with little effort. Not to say that there are no remaining vulnerabilities, but the low hanging fruit could be pr

  • Give those AI hackers broomsticks, and they'll beat every human in sight.

Never try to teach a pig to sing. It wastes your time and annoys the pig. -- Lazarus Long, "Time Enough for Love"

Working...