AI Hackers Are Coming Dangerously Close to Beating Humans (msn.com) 30

Posted by msmash on Thursday December 11, 2025 @12:13PM from the brave-new-world dept.

Stanford researchers spent much of the past year building an AI bot called Artemis that scans networks for software vulnerabilities, and when they pitted it against ten professional penetration testers on the university's own engineering network, the bot outperformed nine of them. The experiment offers a window into how rapidly AI hacking tools have improved after years of underwhelming performance.

"We thought it would probably be below average," said Justin Lin, a Stanford cybersecurity researcher. Artemis found bugs at a fraction of human cost -- just under $60 per hour compared to the $2,000 to $2,500 per day that professional pen testers typically charge. But its performance wasn't flawless. About 18% of its bug reports were false positives, and it completely missed an obvious vulnerability on a webpage that most human testers caught. In one case, Artemis found a bug on an outdated page that didn't render in standard browsers; it used a command-line tool called Curl instead of Chrome or Firefox.

Dan Boneh, a Stanford computer science professor who advised the researchers, noted that vast amounts of software shipped without being vetted by LLMs could now be at risk. "We're in this moment of time where many actors can increase their productivity to find bugs at an extreme scale," said Jacob Klein, head of threat intelligence at Anthropic.

AI Hackers Are Coming Dangerously Close to Beating Humans

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 30 Comments Log In/Create an Account

Comments Filter:

Script Kiddies (Score:4, Funny)

by SlashbotAgent ( 6477336 ) writes: on Thursday December 11, 2025 @12:19PM (#65851035)

Yet again. AI improves the world.
Bonus: Modern script kiddies will be more powerful than ever.

- Vibe hacking (Score:2)
  
  by algaeman ( 600564 ) writes:
  
  Script kiddies won't ever find the good vulns now. APT groups are already using these systems.
  - Re: (Score:3)
    
    by apparently ( 756613 ) writes:
    
    Script kiddies won't ever find the good vulns now. APT groups are already using these systems.
    What do you mean by "now"? Script kiddies -- by definition -- never found ANY vulnerabilities, let alone "good" one's.
    - Re: Vibe hacking (Score:1)
      
      by blue trane ( 110704 ) writes:
      
      Anyone else remember page-widening Klerck, the famous troll driven to suicide by toxic slashdot mods?
      - Re: (Score:3)
        
        by apparently ( 756613 ) writes:
        
        I don't have any idea of who that is. Probably because I use Alexander Peter Kowalski's HOSTS file engine across all of my devices which blocks "troll" posts at the OS level before my browser has a chance to render them.
- Re: Script Kiddies (Score:2)
  
  by fortfive ( 1582005 ) writes:
  
  Iâ(TM)d be curious to see how well true black hats are faring using ai tools.
  Relevant Star Trek, A Taste of Armageddon.
Maybe (Score:3)

by liqu1d ( 4349325 ) writes: on Thursday December 11, 2025 @12:40PM (#65851119)

They had to give it hints as to what to find. I don't see an example of the hints though. While interesting I would love to see more information than presented.

- - Re: Maybe (Score:3)
    
    by liqu1d ( 4349325 ) writes:
    
    My main concern was with what the hint is like. If it's as you suggest and just a "find the XSS" then awesome we can just run a bank of them. If a "high-hint" was "theres a XSS on line 64 of index.html" then it's a big issue.
Make a problem, sell the solution (Score:3)

by AmazingRuss ( 555076 ) writes: on Thursday December 11, 2025 @12:43PM (#65851131)

"We're in this moment of time where many actors can increase their productivity to find bugs at an extreme scale," ... so subscribe now!

One in ten (Score:4, Interesting)

by alleycat0 ( 232486 ) writes: on Thursday December 11, 2025 @12:52PM (#65851169) Homepage

If I was that one guy that beat the AI, I'd be asking for a raise right now.

Why aren't the bugs all hallucinated? (Score:1)

by blue trane ( 110704 ) writes:

How can AI simultaneously be a hallucination machine with no real-world credibility, as proven by math, and also a threat to the security of the systems red-star slashdot commentator gweihir boasts he's in charge of?
- - Re: Why aren't the bugs all hallucinated? (Score:1)
    
    by blue trane ( 110704 ) writes:
    
    Does he make you cringe too?
- - Re: Why aren't the bugs all hallucinated? (Score:1)
    
    by blue trane ( 110704 ) writes:
    
    What if consistency is as much of a mood as the perfection of circles was to epicyclists? What if AI is going to win because it does not try to ban the heresy of inconsistency, and thus models nature better? Will the Law of Non-Contradiction become as cringe as the idea that continents can't drift?
- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  I'll assume you are being serious.
  1. Not all AIs are equivalent to ChatGPT.
  2, Mistaking something that isn't a vulnerability for a vulnerability is relatively low cost.
  3. Finding one vulnerability that's real can be extremely important.
  NOTE: It doesn't NEED to be perfect. If it's "good enough" then it's good enough to be useful. Things that aren't vulnerabilities are relatively cheap to check.
  P.S.: You shouldn't have needed this explanation.
trust/verification of the tool (Score:3)

by david.emery ( 127135 ) writes: on Thursday December 11, 2025 @01:01PM (#65851211)

It makes sense to me that a relatively narrowly focused/narrowly trained AI system eventually beats humans for vulnerability detection. And significant false positives are in this scenario probably acceptable, much more than false negatives. Someone will have to work off those false positives (and presumably feed them back so the tool learns and gets better.)
But at the end of the day, the question is "How do you trust the tool is correct?" Here, at least, you can write a reasonably testable requirement. "Must detect security vulnerabilities" and provide a definition of (which I'm probably not qualified to write :-) ) for 'security vulnerability'. But then someone has to figure out what the verification approach will be, and how that's established/documented. Should there be a formal registry of 'trusted AI vulnerability scanners"? Certainly if we expect such tools to be used for product qualification ("Your website must be shown to contain no vulnerabilities, as inspected by this tool and set of procedures we trust."), we have to have a way to establish that trust.
This is a good news story, but there's much more work to be done to turn this into production. And a lot of that work is not strictly technical, but managerial (probably including government participation, e.g. a NIST set of qualification criteria and maybe even a registry of tools that meet those criteria.)

View Source (Score:2)

by bobbutts ( 927504 ) writes:

Artemis found a bug on an outdated page that didn't render in standard browsers
This seems to suggest that humans were using the browser environment by hand and didn't even consider the source or using some traditional or ai tool to help audit the code. I don't know much about pen testing but I have a hard time believing that is how it's done. Maybe it wasn't how good the AI was but how incompetent the human testers were.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  I would expect a guy named "Bob Butts" to know plenty about penetration testing.
  - Re: (Score:2)
    
    by Locke2005 ( 849178 ) writes:
    
    Bob? Is that Seymore's brother?
Yeah. IT nightmare material. (Score:2)

by Qbertino ( 265505 ) writes:

This is a real problem. Imagine an AI computer virus swarm, with "Brain Bug" leader AIs building and releasing swarms of tailor-made virii to achieve certain hacking goals at a pace no human team of network admins can keep track of.
Hard cryptographic human-controlled Ident/Auth/Auth, encryption and signage is very quickly going to become a real necessity.
I've been saying this sense day 1 (Score:3)

by Locke2005 ( 849178 ) writes: on Thursday December 11, 2025 @02:59PM (#65851631)

The only defense against a bad AI is a good AI. There has always been an arms race between the hackers and the security consultants; AI just accelerates the pace. Ultimately, we will have to rely on AI to defend us from AI. Better get to training those paranoia AIs, boys... Aren't we already at the point that we need to use an AI to detect AI-generated content?

Oh boy (Score:1)

by G4Cube ( 863788 ) writes:

This is a plot device I penned in my novel 4 years ago. Fuck.
more safe or less safe? (Score:2)

by superposed ( 308216 ) writes:

I was ready to say "yay, weâ(TM)ve finally reached the point where AIs can find code and network vulnerabilities and we can patch them and stop worrying so much about exploits." But then I realized, on the other hand, it might just be a battle of AI vs AI now: can the white hat AIs find the exploits faster than the black hat AIs? I wonder which will win out?
Put another way, maybe there's an infinite supply of exploits (especially with vibed code) and the bad AIs may be able to find them faster than the
I like that (Score:2)

by ZipNada ( 10152669 ) writes:

"Artemis found bugs at a fraction of human cost -- just under $60 per hour compared to the $2,000 to $2,500 per day that professional pen testers typically charge."
What this means is that you could get an AI to scan your app and find vulnerabilities for you for relatively little money. This is a good thing. And then probably you could feed the results into your coding AI and get that stuff fixed with little effort. Not to say that there are no remaining vulnerabilities, but the low hanging fruit could be pr
broomsticks (Score:2)

by groobly ( 6155920 ) writes:

Give those AI hackers broomsticks, and they'll beat every human in sight.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

AI Hackers Are Coming Dangerously Close to Beating Humans (msn.com) 30

AI Hackers Are Coming Dangerously Close to Beating Humans More Login

AI Hackers Are Coming Dangerously Close to Beating Humans

Script Kiddies (Score:4, Funny)

Vibe hacking (Score:2)

Re: (Score:3)

Re: Vibe hacking (Score:1)

Re: (Score:3)

Re: Script Kiddies (Score:2)

Maybe (Score:3)

Re: Maybe (Score:3)

Make a problem, sell the solution (Score:3)

One in ten (Score:4, Interesting)

Why aren't the bugs all hallucinated? (Score:1)

Re: Why aren't the bugs all hallucinated? (Score:1)

Re: Why aren't the bugs all hallucinated? (Score:1)

Re: (Score:2)

trust/verification of the tool (Score:3)

View Source (Score:2)

Re: (Score:1)

Re: (Score:2)

Yeah. IT nightmare material. (Score:2)

I've been saying this sense day 1 (Score:3)

Oh boy (Score:1)

more safe or less safe? (Score:2)

I like that (Score:2)

broomsticks (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot