Forgot your password?
typodupeerror
Security IT

AI Can Find Hundreds of Software Bugs -- Fixing Them Is Another Story (theregister.com) 26

Anthropic last week promoted Claude Code Security, a research preview capability that uses its Claude Opus 4.6 model to hunt for software vulnerabilities, claiming its red team had surfaced over 500 bugs in production open-source codebases -- but security researchers say the real bottleneck was never discovery.

Guy Azari, a former security researcher at Microsoft and Palo Alto Networks, told The Register that only two to three of those 500 vulnerabilities have been fixed and none have received CVE assignments. The National Vulnerability Database already carried a backlog of roughly 30,000 CVE entries awaiting analysis in 2025, and nearly two-thirds of reported open-source vulnerabilities lacked an NVD severity score.

The curl project closed its bug bounty program because maintainers could no longer handle the flood of poorly crafted reports from AI tools and humans alike. Feross Aboukhadijeh, CEO of security firm Socket, said discovery is becoming dramatically cheaper but validating findings, coordinating with maintainers, and developing architecture-aligned patches remains slow, human-intensive work.
This discussion has been archived. No new comments can be posted.

AI Can Find Hundreds of Software Bugs -- Fixing Them Is Another Story

Comments Filter:
  • by Mr. Dollar Ton ( 5495648 ) on Wednesday February 25, 2026 @11:39PM (#66010796)

    Checking everything you get from it appropriately is and will remain more work than actually doing it yourself. As it gets "smarter" it will only require more work to figure out where it fails. At the expense of your environment, your quality of life and your future and the future of your kids.

    • FWIW, I evaluated an "AI code review" tool - and it was actually very good. It really did find things which on their own weren't necessarily a problem, but were when taken in context of the rest of the code base. What's more, it did have LLM-produced suggestions of how to fix - but get this, the "AI company" peddling this tool hid that away in a 'pull down', which had lots of warnings about using it directly without human review. I found most of the suggestions pretty good, but I preferred to cut and paste

    • We've also really just got their word to go on so far, "we ran our tool on some projects and it found some issues that we're not going to tell you about". Of the several similar press releases from AI companies I haven't found any yet that will actually let you try their fine product out on actual code so you can see whether it's any good or not. I've signed up for waitlists on several of them but never heard anything back.
  • by Joe_Dragon ( 2206452 ) on Wednesday February 25, 2026 @11:40PM (#66010800)

    just code to pass automation checks even if the UI shows an clear error!

  • Ergh (Score:5, Interesting)

    by liqu1d ( 4349325 ) on Wednesday February 25, 2026 @11:50PM (#66010808)
    This is a result of shitty humans wasting money at a subject they don't understand in the hopes of making a few dollars. AI cannot solve this in its current form nor probably any future ones in the LLM vein. It's a huge shame curl had to do this to combat the shite. I wonder how long until a critical flaw is discovered in important systems but not fixed because maintainers don't have to wade through all the vibing going on.
  • by SubmergedInTech ( 7710960 ) on Thursday February 26, 2026 @12:35AM (#66010820)

    We've had decades to write poorly-tested poorly-reviewed code. But that's ok; as long as it kinda worked, we insisted on shipping it.

    AI is now good enough to show what I told my managers for years: That technical debt builds up, and at some point the bill will come due.

    The bill is now due.

    Thanks to AI, it's now easy to find bugs. And relatively easy to confirm they're exploitable. But thanks to all the rest of the technical debt, much harder to fix the bugs. AI isn't good enough to fix the bugs yet, either, at least not without creating new ones just as fast. So it's a target-rich environment for hackers.

    I'd say "I told you so", but I got out of that rat race a few years ago.

    • Not saying this is you, but having written software for many years then managed software teams for many more, one of the most annoying developers to work with is the one guy who always insists things be done "the right way". I actually think your last line describes them perfectly: they are no doubt waiting for a set of very specific conditions to come to fruition, usually causing disaster, whereby their suggestions are proven useful or right and which would have prevented said disaster. They have an unheal
      • It's not helpful to describe software as "the right way" or "the wrong way."

        Instead, focus on finding "something that works" and avoiding "things that don't work" (and of course, "working" can include the requirement of code being readable, flexible, etc). There is never exactly "one right way." There are always several ways that work and are fine.
      • There's rarely a right way to write code, though there are definitely plenty of wrong ways. :)

        Every SWE goes through phase 1, which is writing yourself into a corner by not thinking about maintainability and interfaces. And then phase 2, where you write such a complex interface that it collapses under its own weight. Only then do you get to the point where you're even aware of the need to balance interface complexity and upgradeability vs. code velocity.

        Usually at that point you mock some stuff up quickly

    • It's not just that... At my employer who may or may not be Cpu provider for Samsung phones we have multiple homebrewed tools based on Claude. They all found tons of issues but 90% of them are false positive with the rest 10% really insignificant. Not because it hallucinate but because it lacks finer understanding of things. But if you forgot to use a mutex somewhere it will catch it most likely.

  • by physicsphairy ( 720718 ) on Thursday February 26, 2026 @12:47AM (#66010828)

    If there is a backlog of 30k vulnerabilities I don't see how AI is even relevant here. Linus Torvalds himself could have entered a fugue state and churned out 500 reports over the weekend and surely it would still be the case that his reports would be stuck in the queue.

    The real concern is if while blue team is stuck on their trusty human-in-the-loop evaluation system, red team is 1000Xing exploits. (I don't think they will mind too much of 90% are false positives or implemented with crummy code.) In that case, be as skeptical as you want of AI code vs human code quality, acceleration-in-kind is the only alternative the blue team has to giving up and airgapping everything.

    • Semi-related. I remember at least twice in the last year (fuzzy on dates), being alerted to a 9+ rated vulnerability that allowed for unauthenticated remote code execution, on a fucking firewall (basically the primary security device for countless business networks). While there are other guilty parties, I'm looking at you Fortinet. Oh lookie, another one from this month: https://fieldeffect.com/blog/f... [fieldeffect.com]

      Even fucking worse, the patch for one of these critical remote code execution flaws, didn't actually [esentire.com]
  • I do coding with various AI models almost every day. Claude 4.6 when I need help with something difficult, otherwise some of the less expensive models will do. They have helped me find quite a many subtle bugs in existing code. They won't do it all on their own of course, but they will write debug code for you if you know how to ask. They will explain exception messages and flag issues in error logs. I've had them spot and fix vulnerabilities, misconfigurations, race conditions. Really very helpful.

  • by phantomfive ( 622387 ) on Thursday February 26, 2026 @03:56AM (#66010880) Journal
    At work this is no big deal.

    Just create a Jira ticket for each bug found, assign the minimum of one story point to each ticket, then spend ten minutes (or an hour) checking it out. If it's a real bug, create another Jira ticket for fixing it (you need to have a meeting to discuss priorities, of course, so don't fix it right away). If it's not a real bug, close it and Voila, a free story point. If it's a duplicate, close it anyway (not marked duplicate, that will take extra work and waste your time) and Voila, another free story point. (If you do need to verify that it's a duplicate, be sure to create a Jira ticket for verifying that it's a duplicate. Don't do any work that is not logged!)

    I figure that including meetings, you can get around 60 story points a week just by looking at AI tickets. Adjust the number based on how many you need that sprint. Your manager will be happy, velocity is skyrocketing because of AI!

    Of course in the real world, this strategy isn't great.
  • by Qbertino ( 265505 ) <moiraNO@SPAMmodparlor.com> on Thursday February 26, 2026 @05:13AM (#66010908)

    Looking through thousands of codefiles and detecting errors and antipatterns is one of those things AI has really surprised me with. Especially with FOSS code and APIs it knows really well. Another one of those definite AI game-changers.

  • by ledow ( 319597 ) on Thursday February 26, 2026 @06:25AM (#66010936) Homepage

    So where you do brute-force pattern matching, AI is useful.

    Where it needs to think, it's worthless.

    Gotcha.

    If only we'd known this before...

  • This is a game changer how vulnerabilities and programming error can be fixed at the source. Very interesting, it will change the cybersecurity industry.
  • Sure, I agree solely identification isn't the end result we need or want. But fixing an issue often begins with noticing it, right?

    What might improve the identification only problem? Asking the same AI for fixes doesn't sound impossible. Just more expensive.

    What if AI companies donated cloud time to "worthy" open source projects to use? Maybe they could use AI to read through and organize all those bug reports (even if they don't try to use AI to fix them).

    Or start charging to receive the bug reports.

COMPASS [for the CDC-6000 series] is the sort of assembler one expects from a corporation whose president codes in octal. -- J.N. Gray

Working...