AI-Generated Code Creates Major Security Risk Through 'Package Hallucinations' (arstechnica.com) 34

Posted by msmash on Tuesday April 29, 2025 @03:25PM from the side-effects dept.

A new study [PDF] reveals AI-generated code frequently references non-existent third-party libraries, creating opportunities for supply-chain attacks. Researchers analyzed 576,000 code samples from 16 popular large language models and found 19.7% of package dependencies -- 440,445 in total -- were "hallucinated."

These non-existent dependencies exacerbate dependency confusion attacks, where malicious packages with identical names to legitimate ones can infiltrate software. Open source models hallucinated at nearly 22%, compared to 5% for commercial models. "Once the attacker publishes a package under the hallucinated name, containing some malicious code, they rely on the model suggesting that name to unsuspecting users," said lead researcher Joseph Spracklen. Alarmingly, 43% of hallucinations repeated across multiple queries, making them predictable targets.

AI-Generated Code Creates Major Security Risk Through 'Package Hallucinations'

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 34 Comments Log In/Create an Account

Comments Filter:

You are not hallucinating (Score:5, Informative)

by Dan East ( 318230 ) writes: on Tuesday April 29, 2025 @03:30PM (#65340715) Journal

You are not hallucinating - this story is a dupe [slashdot.org]

- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Hahaha, just my first thought as well.
- Re: You are not hallucinating (Score:2)
  
  by zawarski ( 1381571 ) writes:
  
  Worse. It's a duo of a dup. Is there a name for that?
  - Re: You are not hallucinating (Score:2)
    
    by zawarski ( 1381571 ) writes:
    
    * Dup of dup. Fuck you /. It's 2025, still can't edit.
Who tests the code ? (Score:3)

by Alain Williams ( 2972 ) writes: <addw@phcomp.co.uk> on Tuesday April 29, 2025 @03:39PM (#65340739) Homepage

As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ? This has got to be by people who really understand the problem that the code is supposed to address. So given a set of inputs, what are the expected outputs ? [ I do understand that this is a simplistic description. ] I would be wary of using AI to generate test cases -- if it hallucinates then what are you testing ?
Another question: who writes the end user documentation ?
I am assuming that a comprehensive specification was written by problem specialists.
I fear that this will not be done properly and fear that in a few years some major corporation go bankrupt, or people die, as a result.

- Re: Who tests the code ? (Score:1)
  
  by twinirondrives ( 10502753 ) writes:
  
  Sounds like the out of branch commit issue. Git had many complaints early on about hallucinations aka itdontwork. Well, you just versioned some software while ignoring the relevant commits.......
  - Re: (Score:2)
    
    by VaccinesCauseAdults ( 7114361 ) writes:
    
    I’m genuinely interested. Can you explain more about this?
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ?
  
  From what I've heard talking to people, one of the most common uses of AI is to generate the tests.
  - Re:Who tests the code ? (Score:4, Insightful)
    
    by micheas ( 231635 ) writes: on Tuesday April 29, 2025 @05:32PM (#65341067) Homepage Journal
    
    As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ?
    From what I've heard talking to people, one of the most common uses of AI is to generate the tests.
    That's because in many companies the primary purpose of tests is so that you can tell auditors and hence customers that your code has x% test coverage. With AI you can hit that checkbox of 100% test coverage with AI tests that are meaningless, but allow you to get the auditor seal of approval that you have good test coverage.
    
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      Does AI actually get 100% test coverage? I would be surprised.
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      btw 100% code coverage does have some value, since some people don't test code the write even once! It shows that the code at least can run, even if it's not necessarily correct.
- Re:Who tests the code ? (Score:4, Insightful)
  
  by TWX ( 665546 ) writes: on Tuesday April 29, 2025 @06:01PM (#65341127)
  
  As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ? This has got to be by people who really understand the problem that the code is supposed to address. So given a set of inputs, what are the expected outputs ? [ I do understand that this is a simplistic description. ]
  I used to test software for a living, alpha stuff right out of the daily builds.
  It was my experience that it took out-of-the-box thinking to come up with real-world tests that accurately reflected both how the software was intended to be used by its developer and ways that someone could misuse it that were actually plausible. I was working on communications protocols because apparently the company lawyers were afraid of BSD licensed code so they wouldn't let the project take existing software. I leveraged my knowledge of somewhat esoteric but actual options/configs/settings in end-user client software that would have to interface to the systems that the company was developing in order to show that yes, these choices I used in testing might not be part of current RFC but even the commercial software that the company itself used internally could be made to use these deprecated methods with just a few clicks of a mouse. This drove the devs mad because they complained about RFC and I responded that I did not care if my test complied with current RFC or not, if I could break their service with off the shelf consumer software doing what it was designed to support then they had a problem. It needed to handle the wrong input cleanly even while rejecting it.
  I don't expect AI to drudge-up old things that aren't talked about much but are still technically possible to a regular end user. I had a hard enough time getting human beings to understand this.
  
Referencing non-existent 3rd party libraries? (Score:2)

by nightflameauto ( 6607976 ) writes:

So, you're saying that AI code is even shittier than first year programmers straight out of the Code Boot Camp? Because I have yet to see one of those that doesn't at least make sure a library exists before referencing it. Even the really, really bad ones.
I know, I know. AI is gonna take all programming jobs any day now. And sadly, it'll probably happen because management would rather have shit code than pay and benefits for employees. The contractor cleanup gigs a few years later will be nice paying, I'm s
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  Most LLM-assisted coding is done with agentic software.
  The LLM is producing library names from "memory", with a good bit of hallucination sprinkled in. A human running unverified isn't particularly better.
  
  The real problem, is that there needs to be assistance for catching potential supply-chain attacks, as even humans could be vulnerable to such a thing.
  Unfortunately, this doesn't fix the potential variant of basically "package squatting"- registering packages/modules with common misspellings- which imp
Pfft. Hallucinations. I swear by librwnj (Score:2)

by RightwingNutjob ( 1302813 ) writes:

And you can too! [slashdot.org]
"Package hallucinations" (Score:1)

by Anonymous Coward writes:

That's what SHE said.
Fedora installed HP bloatware last week (Score:2)

by xack ( 5304745 ) writes:

I wonder if it was related to this as devs blindly don't read their dependencies.
What is stopping the AI from testing dependencies? (Score:2)

by blue trane ( 110704 ) writes:

If you create a malicious package and advertise it enough, doesn't this happen without AI?
- Re: (Score:2)
  
  by Retired Chemist ( 5039029 ) writes:
  
  "Re: what is stopping the AI from testing dependencies?" -- Who knows why these systems do anything? They are black boxes.
  - Re: (Score:2)
    
    by blue trane ( 110704 ) writes:
    
    Why don't they hallucinate grammar or vocabulary?
    ChatGPT says: "LLMs are much better at plausible surface-level generation than verifiable grounded reference, especially in niche domains like package names or APIs."
    - Re: (Score:2)
      
      by Retired Chemist ( 5039029 ) writes:
      
      They sometimes do, but that is what LLMs are primarily trained on. They understand in a mechanical way words and grammar, but they have no idea what they really mean. They can string them together in sentences, but that is all they really do.
      - Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        They string them together in sentences better than you do, and fake understanding of vastly more knowledge than you supposedly have.
        Perhaps you're more like an LLM than you think.
        
        Re: (Score:2)
        
        by Retired Chemist ( 5039029 ) writes:
        
        Perhaps they do, but the keyword is fake.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Perhaps what your brain does is equally fake. Perhaps your sense of self-delusion is just well evolved.
        
        "fake" isn't a compelling argument.
        Define it in a way that precludes you.
AI seems to hallucinate a lot. (Score:2)

by zurkeyon ( 1546501 ) writes:

Must be some really good stuff if it makes a computer trip ;-D
hallucinations? (Score:2)

by sdinfoserv ( 1793266 ) writes:

Hallucinations are random. Recidivism is non-random. Repeating hallucinations across queries for different users imply it's not a hallucination, but a purposeful interpretation of data through logical error or intentional results poisoning.
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  Hallucinations are not random.
  An LLM will still hallucinate even when using a greedy decoding strategy on its logits.
  
  But great job fabricating a bunch of bullshit and trying to pass it off as knowledge.
Given the speed of publishing in academia... (Score:1)

by greencfg ( 4875419 ) writes:

...this study is already obsolete. No, I am not kidding. AI code generatiion makes huge improvementns in just 2-3 months, whereas the development, peer review and publication of academic papers takes 6-12 months.
In short, the models they test in the paper are basically ancient history.
This is how you get Skynet (Score:2)

by necro81 ( 917438 ) writes:

AI "hallucinates" nonexistent packages.
AI creates said packages, so that they're no longer nonexistent.
AI sprinkles the packages with backdoors, so that...
When AI becomes self-aware and turns on the human race, it's already embedded everywhere!
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  5... Profit?
Teach it list of trusted repos? (Score:2)

by mattr ( 78516 ) writes:

Should be easy to add list of trusted repos, and check sigs if there is such a thing.
I got slapped for (Score:2)

by Tablizer ( 95088 ) writes:

...hallucinating my package.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

AI-Generated Code Creates Major Security Risk Through 'Package Hallucinations' (arstechnica.com) 34

AI-Generated Code Creates Major Security Risk Through 'Package Hallucinations' More Login

AI-Generated Code Creates Major Security Risk Through 'Package Hallucinations'

You are not hallucinating (Score:5, Informative)

Re: (Score:2)

Re: You are not hallucinating (Score:2)

Re: You are not hallucinating (Score:2)

Who tests the code ? (Score:3)

Re: Who tests the code ? (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:Who tests the code ? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:Who tests the code ? (Score:4, Insightful)

Referencing non-existent 3rd party libraries? (Score:2)

Re: (Score:2)

Pfft. Hallucinations. I swear by librwnj (Score:2)

"Package hallucinations" (Score:1)

Fedora installed HP bloatware last week (Score:2)

What is stopping the AI from testing dependencies? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

AI seems to hallucinate a lot. (Score:2)

hallucinations? (Score:2)

Re: (Score:2)

Given the speed of publishing in academia... (Score:1)

This is how you get Skynet (Score:2)

Re: (Score:2)

Teach it list of trusted repos? (Score:2)

I got slapped for (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot