Google's Big Sleep LLM Agent Discovers Exploitable Bug In SQLite (scworld.com) 36

Posted by msmash on Tuesday November 05, 2024 @11:05AM from the moving-forward dept.

spatwei writes: Google has used a large language model (LLM) agent called "Big Sleep" to discover a previously unknown, exploitable memory flaw in a widely used software for the first time, the company announced Friday.

The stack buffer underflow vulnerability in a development version of the popular open-source database engine SQLite was found through variant analysis by Big Sleep, which is a collaboration between Google Project Zero and Google DeepMind.

Big Sleep is an evolution of Project Zero's Naptime project, which is a framework announced in June that enables LLMs to autonomously perform basic vulnerability research. The framework provides LLMs with tools to test software for potential flaws in a human-like workflow, including a code browser, debugger, reporter tool and sandbox environment for running Python scripts and recording outputs.

The researchers provided the Gemini 1.5 Pro-driven AI agent with the starting point of a previous SQLIte vulnerability, providing context for Big Sleep to search for potential similar vulnerabilities in newer versions of the software. The agent was presented with recent commit messages and diff changes and asked to review the SQLite repository for unresolved issues.

Google's Big Sleep ultimately identified a flaw involving the function "seriesBestIndex" mishandling the use of the special sentinel value -1 in the iColumn field. Since this field would typically be non-negative, all code that interacts with this field must be designed to handle this unique case properly, which seriesBestIndex fails to do, leading to a stack buffer underflow.

Google's Big Sleep LLM Agent Discovers Exploitable Bug In SQLite

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 36 Comments Log In/Create an Account

Comments Filter:

Nice find, but... (Score:2)

by iAmWaySmarterThanYou ( 10095012 ) writes:

It was looking at recent commits.
Did it find something other current tools would not have found?
- Re: (Score:2)
  
  by UnknowingFool ( 672806 ) writes:
  
  xkcd [xkcd.com] found it years ago.
  - Re: (Score:2)
    
    by iAmWaySmarterThanYou ( 10095012 ) writes:
    
    Oh damn it, I didn't think to go there. Good catch.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  It just goes to show, even when you use the wrong tool for the job, it sometimes ends up working.
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
  - Re: (Score:2)
    
    by smooth wombat ( 796938 ) writes:
    
    A human probably could have found it if they were looking for it, but how long would it take? If this LLM found it in an hour or two while a human would have taken a week or two, then it's money well spent.
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
- Re: (Score:2)
  
  by arglebargle_xiv ( 2212710 ) writes:
  It follows the usual pattern for these tools:
  
  Here is a cool clickbaity name
  Here is at least one success case where it found something, possibly something that two beers' worth of analysis would also have turned up.
  Here is a recording of crickets that you can play when you ask for a copy of the tool or details of what else it's achieved.
Risky to disclose (Score:5, Informative)

by idontusenumbers ( 1367883 ) writes: on Tuesday November 05, 2024 @11:36AM (#64921371)

This seems risky to disclose considering the nature of sqlite being embedded and how many things that use SQL don't use a shared library or get updated often, if ever.

- Re: (Score:3)
  
  by Mononymous ( 6156676 ) writes:
  
  You're saying the disclosure is the risky thing, and not using the software in that way?
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    You're saying the disclosure is the risky thing, and not using the software in that way?
    Fortunately, we found this issue before it appeared in an official release, so SQLite users were not impacted [blogspot.com]
- Re: (Score:3)
  
  by Fly Swatter ( 30498 ) writes:
  
  ... "vulnerability in a development version" makes me think this is not an old bug in existing releases.
  
  My tinfoil hat thinks the bug was committed just to sell the idea of AI 'finding' it.
  - Re: Risky to disclose (Score:1)
    
    by home-electro.com ( 1284676 ) writes:
    
    It most certainly looks like deliberately introduced vulnerability to test the AI.
- Re:Risky to disclose (Score:5, Informative)
  
  by swillden ( 191260 ) writes: <shawn-ds@willden.org> on Tuesday November 05, 2024 @01:13PM (#64921619) Journal
  
  This seems risky to disclose considering the nature of sqlite being embedded and how many things that use SQL don't use a shared library or get updated often, if ever.
  Embedded sqlite libraries (and lots of other stuff) not being updated often is the problem there, not the disclosure. It's a broad and deep problem, and a serious one, but holding back disclosures is not how you fix or mitigate it. Holding back disclosure just ensures that more device/systems are vulnerable for more time.
  In cases where it's feasible to notify developers who use a vulnerable library before public disclosure that's the right way to do it, but for widely-used open source libraries like sqlite, there's no way. Any notification to developers using sqlite is a public notification. The best you can do with sqlite is to let the sqlite team notify all paid support contract holders, and it seems likely that was done since the sqlite team was notified a month ago and the public announcement was last week.
  
  - Re: (Score:2)
    
    by reanjr ( 588767 ) writes:
    
    "All historical vulnerabilities reported against SQLite require at least one of these preconditions:
    The attacker can submit and run arbitrary SQL statements.
    The attacker can submit a maliciously crafted database file to the application that the application will then open and query."
    I can't think of anyone using SQLite in a way that would actually present a risk.
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Re: Risky to disclose (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        Yeah, I've always felt like CVEs were mostly security theater. As a system administrator, I am going to concentrate on layered security and keeping everything patched. There's almost never any reasonable action I can take on a CVE. That shit is upstream. But then the bosses want to take up my time proving why each CVE doesn't apply. And even when it does, they're not willing to let me shutdown the server while upstream patches it, so WTF am I gonna do with that? Maybe once in a blue moon there's a config to
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
    - Re: (Score:2)
      
      by swillden ( 191260 ) writes:
      
      I can't think of anyone using SQLite in a way that would actually present a risk.
      We can hope.
      That said, I agree that sqlite is an impressively high-quality piece of software, and its APIs don't encourage app developers to write code in ways that enable arbitrary SQL injection, so... maybe.
      I also want to plug paid sqlite support (honestly the main reason I decided to reply). I don't know what it costs, but I've had to use it twice and Dr. Hipp and his team are fantastic. Very responsive and extremely capable. No "Did you try this list of obvious things" after your carefully-written
- Re: (Score:2)
  
  by bill_mcgonigle ( 4333 ) * writes:
  
  "in a development version"
  Caught before release according to TFS.
- Re: (Score:2)
  
  by arglebargle_xiv ( 2212710 ) writes:
  
  It's actually pretty damn difficult if not impossible to exploit in most cases, in particular for the very embedded uses you mention, so the risk appear to be pretty minimal.
sentinel (Score:2)

by groobly ( 6155920 ) writes:

Seems that sentinel values are a bad idea in the first place.
Interesting possibility (Score:3)

by gillbates ( 106458 ) writes: on Tuesday November 05, 2024 @11:59AM (#64921419) Homepage Journal

Maybe in the near future, software engineers will rely on AI to write the test cases and run the tests... Because, let's face it, how many software engineers like writing test cases?

- Re: (Score:3)
  
  by suutar ( 1860506 ) writes:
  
  I have seen some AI generated test cases - trivial stuff, like "that method returns a string. Verify that the string is equal to " but (a) good for coverage numbers and (b) good for catching accidental modifications to user-visible text.
  I look forward to when it can do more sophisticated checking of the code flow and build test cases that exercise different paths. I don't expect it to be perfect but it'd be a nice starting point.
  - Re: (Score:3)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
- Re: (Score:2)
  
  by reanjr ( 588767 ) writes:
  
  I find writing tests to be relaxing, as long as the code being tested is functional. Object-oriented software test cases are a fucking nightmare.
Big Sleep? (Score:3)

by Tim the Gecko ( 745081 ) writes: on Tuesday November 05, 2024 @12:59PM (#64921591)

I wonder if the LLM is named after the movie [nytimes.com].
The Big Sleep is one of those pictures in which so many cryptic things occur amid so much involved and devious plotting that the mind becomes utterly confused. And, to make it more aggravating, the brilliant detective in the case is continuously making shrewd deductions which he stubbornly keeps to himself. What with two interlocking mysteries and a great many characters involved, the complex of blackmail and murder soon becomes a web of utter bafflement. Unfortunately, the cunning script-writers have done little to clear it at the end.

- Re: (Score:2)
  
  by 93 Escort Wagon ( 326346 ) writes:
  
  That 1946 reviewer doesn't seem to have liked the picture! It's one of my favorites, though.
- Re: (Score:1)
  
  by xgarb ( 660153 ) writes:
  
  https://www.youtube.com/watch?... [youtube.com]
A good use of AI (Score:2)

by Larsrc ( 1285062 ) writes:

An area with lots of mostly-automateable work where the result can be checked by humans and false positives are no big deal. Perfect usecase for AI.
Grammar (Score:2)

by jabberw0k ( 62554 ) writes:

text should be "...a widely used software *package*" ... there is no such thing as "a software" just as you do not have "one information." You have a piece of software. Grammar. *sigh*
train AI to discover flaws in source code instead (Score:1)

by cyrilc ( 126593 ) writes:

While this is good news that LLMs are used to discover potential 0-days, it would be much better if AI could be trained to spot such flaws directly in the code instead of being just getting better at running fuzzer against binary
Discovering exploits at analyzing the source code would not only be a real breakthrough, but also a major progress at having a more secure code base.
Not AI (Score:2)

by ledow ( 319597 ) writes:

"However, the team emphasized that Big Sleep remains âoehighly experimentalâ and that they believe a target-specific fuzzer âoewould be at least as effectiveâ at detecting vulnerabilities as the AI agent in its current state."
It was also only a bug in recently-committed development code, never pushed to release, and there's nothing to say it wouldn't have been caught before then.
Sorry, but this is more AI hyperbole even as the authors literally say "Yeah, you could also find this with a
- Re: (Score:2)
  
  by Currently_Defacating ( 10122078 ) writes:
  
  ^^^this 100%

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Nice find, but... (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Risky to disclose (Score:5, Informative)

Re: (Score:3)

Re: (Score:1)

Re: (Score:3)

Re: Risky to disclose (Score:1)

Re:Risky to disclose (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: Risky to disclose (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

sentinel (Score:2)

Interesting possibility (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Big Sleep? (Score:3)

Re: (Score:2)

Re: (Score:1)

A good use of AI (Score:2)

Grammar (Score:2)

train AI to discover flaws in source code instead (Score:1)

Not AI (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals