How Anthropic's Claude Helped Mozilla Improve Firefox's Security (yahoo.com) 41
"It took Anthropic's most advanced artificial-intelligence model about 20 minutes to find its first Firefox browser bug during an internal test of its hacking prowess," reports the Wall Street Journal.
The Anthropic team submitted it, and Firefox's developers quickly wrote back: This bug was serious. Could they get on a call? "What else do you have? Send us more," said Brian Grinstead, an engineer with Mozilla, Firefox's parent organization.
Anthropic did. Over a two-week period in January, Claude Opus 4.6 found more high-severity bugs in Firefox than the rest of the world typically reports in two months, Mozilla said... In the two weeks it was scanning, Claude discovered more than 100 bugs in total, 14 of which were considered "high severity..." Last year, Firefox patched 73 bugs that it rated as either high severity or critical.
A Mozilla blog post calls Firefox "one of the most scrutinized and security-hardened codebases on the web. Open source means our code is visible, reviewable, and continuously stress-tested by a global community." So they're impressed — and also thankful Anthropic provided test cases "that allowed our security team to quickly verify and reproduce each issue." Within hours, our platform engineers began landing fixes, and we kicked off a tight collaboration with Anthropic to apply the same technique across the rest of the browser codebase... . A number of the lower-severity findings were assertion failures, which overlapped with issues traditionally found through fuzzing, an automated testing technique that feeds software huge numbers of unexpected inputs to trigger crashes and bugs. However, the model also identified distinct classes of logic errors that fuzzers had not previously uncovered...
We view this as clear evidence that large-scale, AI-assisted analysis is a powerful new addition in security engineers' toolbox. Firefox has undergone some of the most extensive fuzzing, static analysis, and regular security review over decades. Despite this, the model was able to reveal many previously unknown bugs. This is analogous to the early days of fuzzing; there is likely a substantial backlog of now-discoverable bugs across widely deployed software.
"In the time it took us to validate and submit this first vulnerability to Firefox, Claude had already discovered fifty more unique crashing inputs" in 6,000 C++ files, Anthropic says in a blog post (which points out they've also used Claude Opus 4.6 to discover vulnerabilities in the Linux kernel).
"Anthropic "also rolled out Claude Code Security, an automated code security testing tool, last month," reports Axios, noting the move briefly rattled cybersecurity stocks...
Anthropic did. Over a two-week period in January, Claude Opus 4.6 found more high-severity bugs in Firefox than the rest of the world typically reports in two months, Mozilla said... In the two weeks it was scanning, Claude discovered more than 100 bugs in total, 14 of which were considered "high severity..." Last year, Firefox patched 73 bugs that it rated as either high severity or critical.
A Mozilla blog post calls Firefox "one of the most scrutinized and security-hardened codebases on the web. Open source means our code is visible, reviewable, and continuously stress-tested by a global community." So they're impressed — and also thankful Anthropic provided test cases "that allowed our security team to quickly verify and reproduce each issue." Within hours, our platform engineers began landing fixes, and we kicked off a tight collaboration with Anthropic to apply the same technique across the rest of the browser codebase... . A number of the lower-severity findings were assertion failures, which overlapped with issues traditionally found through fuzzing, an automated testing technique that feeds software huge numbers of unexpected inputs to trigger crashes and bugs. However, the model also identified distinct classes of logic errors that fuzzers had not previously uncovered...
We view this as clear evidence that large-scale, AI-assisted analysis is a powerful new addition in security engineers' toolbox. Firefox has undergone some of the most extensive fuzzing, static analysis, and regular security review over decades. Despite this, the model was able to reveal many previously unknown bugs. This is analogous to the early days of fuzzing; there is likely a substantial backlog of now-discoverable bugs across widely deployed software.
"In the time it took us to validate and submit this first vulnerability to Firefox, Claude had already discovered fifty more unique crashing inputs" in 6,000 C++ files, Anthropic says in a blog post (which points out they've also used Claude Opus 4.6 to discover vulnerabilities in the Linux kernel).
"Anthropic "also rolled out Claude Code Security, an automated code security testing tool, last month," reports Axios, noting the move briefly rattled cybersecurity stocks...
Seems like a good use of this technology (Score:3)
Given how these AIs are trained on massive amounts of real world examples, it makes sense that if you give it any given code base, it's going to find kind of errors given what it can compare. AI seems to be really good at pattern recognition and we should definitely lean into that.
Defensively speaking, we need all the help we can get in finding and eliminating bugs in our software. Given it's always been easier to break things as opposed to fixing them, the rate at which AI can be used to write malicious code likely out-paces our ability to find and fix the code. The never ending game.
Re: (Score:1, Insightful)
Mozilla promotes Eich to CEO knowing he's a dick-head the entire company can't get behind.
Eich resigns after realizing he's completely unsuitable for the role.
RWNJs: "No, Mozilla is woke, and should do EXACTLY WHAT IT DID THE FIRST TIME again to rectify this!"
Idiots.
Re: (Score:2)
OK Pedo protector.
Re: (Score:2)
It's fine if you want to use AI to find errors.
You MUST NOT let the AI write the code. Whatever you submit, must be code YOU wrote. AI "vibe coding" is going to bloat and destroy a lot of products before people get told to stop doing it as their first tool in the box instead of the last.
Re: (Score:3)
You mean you must submit good code. Whether it's created by an AI model or a human, it needs to be reviewed and good quality. Letting an AI submit piles of slop code is bad, but not all AI code is slop
Re: (Score:3)
Re: (Score:3)
They should get it to refactor the codebase. Make it easy to build and adapt, so it can be used as the basis for other browsers like Chromium is. Then document it extensively.
It's the only way to survive. AI isn't going to save them.
Re: (Score:2)
Firefox is already used as the basis of other browsers, some direct clones, some radically different.
And maybe they shouldn't be using an LLM to refactor code. Who exactly is going to maintain this slop? Or do you think Vibe-coded shit is built to last?
There are so many things wrong, on every single level, with what you just said I find it hard to believe I managed to pick just two.
Re: (Score:2)
All the Firefox based browsers are nothing more than changing the default settings. The code is a nightmare to work on.
Re: (Score:2)
Except that this is a meaningless stunt and the actual performance of the LLM is does not even remotely resemble the claims made.
Re: Seems like a good use of this technology (Score:1)
Re: (Score:2)
Well, I have actual research results on this. You just have a mindless belief.
Re:Seems like a good use of this technology (Score:4, Interesting)
No, it really isn't.
This is a PR puff piece and almost certainly both over exaggerates the degree to which Claude help and leaves out massive pieces of information suggesting that a lot of work was needed to get it to the point it could actually help.
What's happening in reality is people are using Claude to find "bugs", submitting them to bug bounty programs, and overloading the authors of software like Curl with ridiculous amounts of slop.
It's not that you can't use Claude to find bugs. It's that people who use Claude to find bugs either don't spend much time determining whether Claude was right, or have to put in so much effort that it's questionable whether Claude would have been better than, say, "grep strcpy", in terms of finding anything useful.
Re: (Score:2)
Do humans still know how to code (or even write a paper anymore)? Why can't they sit down and go through the code and find holes and fix them?
All LLM-AI is going to do is teach everyone they don't need to read anything themselves, research anything, code anything, or even do anything themselves... all we have to do is ask 'Clod' to flush the toilet or "put a bag of popcorn in the microwave" or "summon the car".
Congrats... we're on our way to becoming the blobs from WALL-E!
Re: (Score:2, Troll)
and overloading the authors of software like Curl
Got any other irrelevant and off topic opinions on a story that is about *checks notes* not Curl?
Re: (Score:3)
I think you failed to read the summary.
First of all, when researchers use Claude for security research, they basically always have the LLM not only find the bug but also validate it and even produce PoC exploit code. All of which the LLM can and does do, and far faster than humans.
Second, the Mozilla team definitely did determine that Claude was right and found its output far more useful than grep strcpy.
Good! (Score:2)
Cool AI hype post, too bad reality is here. (Score:5, Informative)
"omfg Claude found a ton of bugs! critical! high! buy our credits so you can find bugs from your own software! STOCK VALUE PUMP!"
Reality:
Success Rate: Claude attempted to write exploit code for these bugs. It produced 2 working exploits.
Real-World Viability: Zero. Anthropic’s own Red Team lead (Logan Graham) admitted these exploits only worked on a "test version" of the browser.
Mostly "assertion failures" (code that doesn't follow its own rules and crashes) and "logic errors" that traditional fuzzing (automated random input testing) had missed.
I wish Slashdot still had editors that actually understood what they're copy pasting.
Re: (Score:2)
Oh yeah and the sandbox was turned off.
Such stocks! much wow!
Re: (Score:2)
Indeed. And if the editors would recognize meaningless stunts, that would also be nice.
Re:Cool AI hype post, too bad reality is here. (Score:4, Interesting)
Maybe it wasn't great at writing exploit code, but so what? Claude "found more high-severity bugs in Firefox than the rest of the world typically reports in two months, Mozilla said."
Re: (Score:2)
Re: (Score:2)
Yeah, and bugreports without exploits are useless...
Most developers even manage to fix bugs without reproducing example if the description is good enough.
Re: (Score:2)
Real-World Viability: Zero. Anthropic’s own Red Team lead (Logan Graham) admitted these exploits only worked on a "test version" of the browser.
So you say there's no real-world impact to finding bug in test software? Great. Let's just stop beta testing altogether and ship everyone including LTS users the latest nightly build from whatever some 15 year old cobbled together.
Okay ... sidenote ... I was about to insult your intelligence by referencing you directly, but ... you actually chose the username derplord? Like you picked that yourself? If this was an effort to diffuse discussion on the internet by stopping people from saying "derp derp" to you
Re: (Score:2)
Anthropic’s own Red Team lead (Logan Graham) admitted these exploits only worked on a "test version" of the browser.
Citation?
Both the article from mozilla [mozilla.org] and anthropic [anthropic.com] doesn't mention anything about a "test version of the browser", instead it specifically states the current/latest version of Firefox...
So we tasked Claude with finding novel vulnerabilities in the current version of Firefox—bugs that by definition can’t have been reported before. We focused first on Firefox’s JavaScript engine but then expanded to other areas of the browser.
The article goes on to state:
After just twenty minutes of exploration, Claude Opus 4.6 reported that it had identified a Use After Free (a type of memory vulnerability that could allow attackers to overwrite data with arbitrary malicious content) in the JavaScript engine. One of our researchers validated this bug in an independent virtual machine with the latest Firefox release, then forwarded it to two other Anthropic researchers, who also validated the bug.
Here's the list of all fixed vulnerabilities in Firefox 148 [mozilla.org], as found by Claude Opus 4.6.
Mozilla's themselves state:
AI-assisted bug reports have a mixed track record, and skepticism is earned. Too many submissions have meant false positives and an extra burden for open source projects. What we received from the Frontier Red Team at Anthropic was different.
Re: (Score:2)
> Both the article from mozilla [mozilla.org] and anthropic [anthropic.com] doesn't mention anything about a "test version of the browser", instead it specifically states the current/latest version of Firefox...
Right, those articles don't.
Claude found the bugs, and then tried to exploit 2 of them but...
"Anthropic’s team also asked Claude to build exploit code – the kind of tool a hacker would use to actually attack someone through a discovered vulnerability. While Claude did write two working
Good luck with that (Score:2)
LLMs are absolute shit at spotting vulnerabilities and at proposing fixes. Yes, they find some things. But the more serious the vulnerabilities, the worse they perform. On CVE level they find almost nothing. And that is the level that counts.
Re: (Score:3)
There is you and your claims. And then there is Mozilla, with '14 of which were considered "high severity..."'.
So who to believe? You, in the peanut gallery, or Mozilla, using Anthropic's best models and documenting their results?
I know who I'm going with.
Also, what is the point of Rust now? LLMs run down defects at low cost. Do we need new languages that attempt to prevent defects by design? Perhaps we're better off with simpler languages, and simpler compilers that are fast and portable. We ma
Re: (Score:3)
It has already been established by others that Mozilla is lying about the "high severity".
Re: (Score:2)
This is false.
Mozilla themselves state in the article:
"AI-assisted bug reports have a mixed track record, and skepticism is earned. Too many submissions have meant false positives and an extra burden for open source projects. What we received from the Frontier Red Team at Anthropic was different."
Regardless, you can see various high-severity security issues found by Claude Opus 4.6 patched in the latest version of Firefox (v148) here [mozilla.org].
Re: (Score:3)
Who the fuck told you that? Mozilla's source code is still 98% C++ + javascript.
Only small parts of code were rewritten in rust (like the CSS parser).
The C++ code is mostly the same baroque, old-style C++ code from 25 years ago, but with a lot of ugly cruft and cargo-cult accumulated during failed refactorings and trial-and-error "fixing" performed by various ignorant imbeciles.
Re: (Score:2)
Meanwhile (Score:2)
They can't be bothered to fix (or even acknoledge!) serious bugs reported by humans, even those including patches
Good! (Score:2)
Now can they set Claude loose on Firefox's ever-deteriorating and increasingly-difficult-to-configure-sensibly user interface?