AI Hackers Are Coming Dangerously Close to Beating Humans (msn.com) 30
Stanford researchers spent much of the past year building an AI bot called Artemis that scans networks for software vulnerabilities, and when they pitted it against ten professional penetration testers on the university's own engineering network, the bot outperformed nine of them. The experiment offers a window into how rapidly AI hacking tools have improved after years of underwhelming performance.
"We thought it would probably be below average," said Justin Lin, a Stanford cybersecurity researcher. Artemis found bugs at a fraction of human cost -- just under $60 per hour compared to the $2,000 to $2,500 per day that professional pen testers typically charge. But its performance wasn't flawless. About 18% of its bug reports were false positives, and it completely missed an obvious vulnerability on a webpage that most human testers caught. In one case, Artemis found a bug on an outdated page that didn't render in standard browsers; it used a command-line tool called Curl instead of Chrome or Firefox.
Dan Boneh, a Stanford computer science professor who advised the researchers, noted that vast amounts of software shipped without being vetted by LLMs could now be at risk. "We're in this moment of time where many actors can increase their productivity to find bugs at an extreme scale," said Jacob Klein, head of threat intelligence at Anthropic.
"We thought it would probably be below average," said Justin Lin, a Stanford cybersecurity researcher. Artemis found bugs at a fraction of human cost -- just under $60 per hour compared to the $2,000 to $2,500 per day that professional pen testers typically charge. But its performance wasn't flawless. About 18% of its bug reports were false positives, and it completely missed an obvious vulnerability on a webpage that most human testers caught. In one case, Artemis found a bug on an outdated page that didn't render in standard browsers; it used a command-line tool called Curl instead of Chrome or Firefox.
Dan Boneh, a Stanford computer science professor who advised the researchers, noted that vast amounts of software shipped without being vetted by LLMs could now be at risk. "We're in this moment of time where many actors can increase their productivity to find bugs at an extreme scale," said Jacob Klein, head of threat intelligence at Anthropic.
Script Kiddies (Score:4, Funny)
Yet again. AI improves the world.
Bonus: Modern script kiddies will be more powerful than ever.
Vibe hacking (Score:2)
Re: (Score:3)
Script kiddies won't ever find the good vulns now. APT groups are already using these systems.
What do you mean by "now"? Script kiddies -- by definition -- never found ANY vulnerabilities, let alone "good" one's.
Re: Vibe hacking (Score:1)
Anyone else remember page-widening Klerck, the famous troll driven to suicide by toxic slashdot mods?
Re: (Score:3)
Re: Script Kiddies (Score:2)
Iâ(TM)d be curious to see how well true black hats are faring using ai tools.
Relevant Star Trek, A Taste of Armageddon.
Maybe (Score:3)
Re: Maybe (Score:3)
Make a problem, sell the solution (Score:3)
One in ten (Score:4, Interesting)
Why aren't the bugs all hallucinated? (Score:1)
How can AI simultaneously be a hallucination machine with no real-world credibility, as proven by math, and also a threat to the security of the systems red-star slashdot commentator gweihir boasts he's in charge of?
Re: Why aren't the bugs all hallucinated? (Score:1)
Does he make you cringe too?
Re: Why aren't the bugs all hallucinated? (Score:1)
What if consistency is as much of a mood as the perfection of circles was to epicyclists? What if AI is going to win because it does not try to ban the heresy of inconsistency, and thus models nature better? Will the Law of Non-Contradiction become as cringe as the idea that continents can't drift?
Re: (Score:2)
I'll assume you are being serious.
1. Not all AIs are equivalent to ChatGPT.
2, Mistaking something that isn't a vulnerability for a vulnerability is relatively low cost.
3. Finding one vulnerability that's real can be extremely important.
NOTE: It doesn't NEED to be perfect. If it's "good enough" then it's good enough to be useful. Things that aren't vulnerabilities are relatively cheap to check.
P.S.: You shouldn't have needed this explanation.
trust/verification of the tool (Score:3)
It makes sense to me that a relatively narrowly focused/narrowly trained AI system eventually beats humans for vulnerability detection. And significant false positives are in this scenario probably acceptable, much more than false negatives. Someone will have to work off those false positives (and presumably feed them back so the tool learns and gets better.)
But at the end of the day, the question is "How do you trust the tool is correct?" Here, at least, you can write a reasonably testable requirement. "Must detect security vulnerabilities" and provide a definition of (which I'm probably not qualified to write :-) ) for 'security vulnerability'. But then someone has to figure out what the verification approach will be, and how that's established/documented. Should there be a formal registry of 'trusted AI vulnerability scanners"? Certainly if we expect such tools to be used for product qualification ("Your website must be shown to contain no vulnerabilities, as inspected by this tool and set of procedures we trust."), we have to have a way to establish that trust.
This is a good news story, but there's much more work to be done to turn this into production. And a lot of that work is not strictly technical, but managerial (probably including government participation, e.g. a NIST set of qualification criteria and maybe even a registry of tools that meet those criteria.)
View Source (Score:2)
Artemis found a bug on an outdated page that didn't render in standard browsers
This seems to suggest that humans were using the browser environment by hand and didn't even consider the source or using some traditional or ai tool to help audit the code. I don't know much about pen testing but I have a hard time believing that is how it's done. Maybe it wasn't how good the AI was but how incompetent the human testers were.
Re: (Score:1)
I would expect a guy named "Bob Butts" to know plenty about penetration testing.
Re: (Score:2)
Yeah. IT nightmare material. (Score:2)
This is a real problem. Imagine an AI computer virus swarm, with "Brain Bug" leader AIs building and releasing swarms of tailor-made virii to achieve certain hacking goals at a pace no human team of network admins can keep track of.
Hard cryptographic human-controlled Ident/Auth/Auth, encryption and signage is very quickly going to become a real necessity.
I've been saying this sense day 1 (Score:3)
Oh boy (Score:1)
more safe or less safe? (Score:2)
I was ready to say "yay, weâ(TM)ve finally reached the point where AIs can find code and network vulnerabilities and we can patch them and stop worrying so much about exploits." But then I realized, on the other hand, it might just be a battle of AI vs AI now: can the white hat AIs find the exploits faster than the black hat AIs? I wonder which will win out?
Put another way, maybe there's an infinite supply of exploits (especially with vibed code) and the bad AIs may be able to find them faster than the
I like that (Score:2)
"Artemis found bugs at a fraction of human cost -- just under $60 per hour compared to the $2,000 to $2,500 per day that professional pen testers typically charge."
What this means is that you could get an AI to scan your app and find vulnerabilities for you for relatively little money. This is a good thing. And then probably you could feed the results into your coding AI and get that stuff fixed with little effort. Not to say that there are no remaining vulnerabilities, but the low hanging fruit could be pr
broomsticks (Score:2)
Give those AI hackers broomsticks, and they'll beat every human in sight.