Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Programming Security

AI-Generated Code Creates Major Security Risk Through 'Package Hallucinations' (arstechnica.com) 34

A new study [PDF] reveals AI-generated code frequently references non-existent third-party libraries, creating opportunities for supply-chain attacks. Researchers analyzed 576,000 code samples from 16 popular large language models and found 19.7% of package dependencies -- 440,445 in total -- were "hallucinated."

These non-existent dependencies exacerbate dependency confusion attacks, where malicious packages with identical names to legitimate ones can infiltrate software. Open source models hallucinated at nearly 22%, compared to 5% for commercial models. "Once the attacker publishes a package under the hallucinated name, containing some malicious code, they rely on the model suggesting that name to unsuspecting users," said lead researcher Joseph Spracklen. Alarmingly, 43% of hallucinations repeated across multiple queries, making them predictable targets.

AI-Generated Code Creates Major Security Risk Through 'Package Hallucinations'

Comments Filter:
  • by Dan East ( 318230 ) on Tuesday April 29, 2025 @03:30PM (#65340715) Journal

    You are not hallucinating - this story is a dupe [slashdot.org]

  • by Alain Williams ( 2972 ) <addw@phcomp.co.uk> on Tuesday April 29, 2025 @03:39PM (#65340739) Homepage

    As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ? This has got to be by people who really understand the problem that the code is supposed to address. So given a set of inputs, what are the expected outputs ? [ I do understand that this is a simplistic description. ] I would be wary of using AI to generate test cases -- if it hallucinates then what are you testing ?

    Another question: who writes the end user documentation ?

    I am assuming that a comprehensive specification was written by problem specialists.

    I fear that this will not be done properly and fear that in a few years some major corporation go bankrupt, or people die, as a result.

    • Sounds like the out of branch commit issue. Git had many complaints early on about hallucinations aka itdontwork. Well, you just versioned some software while ignoring the relevant commits.......
    • As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ?

      From what I've heard talking to people, one of the most common uses of AI is to generate the tests.

      • by micheas ( 231635 ) on Tuesday April 29, 2025 @05:32PM (#65341067) Homepage Journal

        As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ?

        From what I've heard talking to people, one of the most common uses of AI is to generate the tests.

        That's because in many companies the primary purpose of tests is so that you can tell auditors and hence customers that your code has x% test coverage. With AI you can hit that checkbox of 100% test coverage with AI tests that are meaningless, but allow you to get the auditor seal of approval that you have good test coverage.

        • Does AI actually get 100% test coverage? I would be surprised.
        • btw 100% code coverage does have some value, since some people don't test code the write even once! It shows that the code at least can run, even if it's not necessarily correct.
    • by TWX ( 665546 ) on Tuesday April 29, 2025 @06:01PM (#65341127)

      As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ? This has got to be by people who really understand the problem that the code is supposed to address. So given a set of inputs, what are the expected outputs ? [ I do understand that this is a simplistic description. ]

      I used to test software for a living, alpha stuff right out of the daily builds.

      It was my experience that it took out-of-the-box thinking to come up with real-world tests that accurately reflected both how the software was intended to be used by its developer and ways that someone could misuse it that were actually plausible. I was working on communications protocols because apparently the company lawyers were afraid of BSD licensed code so they wouldn't let the project take existing software. I leveraged my knowledge of somewhat esoteric but actual options/configs/settings in end-user client software that would have to interface to the systems that the company was developing in order to show that yes, these choices I used in testing might not be part of current RFC but even the commercial software that the company itself used internally could be made to use these deprecated methods with just a few clicks of a mouse. This drove the devs mad because they complained about RFC and I responded that I did not care if my test complied with current RFC or not, if I could break their service with off the shelf consumer software doing what it was designed to support then they had a problem. It needed to handle the wrong input cleanly even while rejecting it.

      I don't expect AI to drudge-up old things that aren't talked about much but are still technically possible to a regular end user. I had a hard enough time getting human beings to understand this.

  • So, you're saying that AI code is even shittier than first year programmers straight out of the Code Boot Camp? Because I have yet to see one of those that doesn't at least make sure a library exists before referencing it. Even the really, really bad ones.

    I know, I know. AI is gonna take all programming jobs any day now. And sadly, it'll probably happen because management would rather have shit code than pay and benefits for employees. The contractor cleanup gigs a few years later will be nice paying, I'm s

    • Most LLM-assisted coding is done with agentic software.
      The LLM is producing library names from "memory", with a good bit of hallucination sprinkled in. A human running unverified isn't particularly better.

      The real problem, is that there needs to be assistance for catching potential supply-chain attacks, as even humans could be vulnerable to such a thing.
      Unfortunately, this doesn't fix the potential variant of basically "package squatting"- registering packages/modules with common misspellings- which imp
  • by Anonymous Coward

    That's what SHE said.

  • I wonder if it was related to this as devs blindly don't read their dependencies.
  • If you create a malicious package and advertise it enough, doesn't this happen without AI?

    • "Re: what is stopping the AI from testing dependencies?" -- Who knows why these systems do anything? They are black boxes.
      • Why don't they hallucinate grammar or vocabulary?

        ChatGPT says: "LLMs are much better at plausible surface-level generation than verifiable grounded reference, especially in niche domains like package names or APIs."

        • They sometimes do, but that is what LLMs are primarily trained on. They understand in a mechanical way words and grammar, but they have no idea what they really mean. They can string them together in sentences, but that is all they really do.
          • They string them together in sentences better than you do, and fake understanding of vastly more knowledge than you supposedly have.
            Perhaps you're more like an LLM than you think.
  • Must be some really good stuff if it makes a computer trip ;-D
  • Hallucinations are random. Recidivism is non-random. Repeating hallucinations across queries for different users imply it's not a hallucination, but a purposeful interpretation of data through logical error or intentional results poisoning.
    • Hallucinations are not random.
      An LLM will still hallucinate even when using a greedy decoding strategy on its logits.

      But great job fabricating a bunch of bullshit and trying to pass it off as knowledge.
  • ...this study is already obsolete. No, I am not kidding. AI code generatiion makes huge improvementns in just 2-3 months, whereas the development, peer review and publication of academic papers takes 6-12 months.

    In short, the models they test in the paper are basically ancient history.

  • AI "hallucinates" nonexistent packages.
    AI creates said packages, so that they're no longer nonexistent.
    AI sprinkles the packages with backdoors, so that...
    When AI becomes self-aware and turns on the human race, it's already embedded everywhere!
  • Should be easy to add list of trusted repos, and check sigs if there is such a thing.

  • ...hallucinating my package.

BASIC is to computer programming as QWERTY is to typing. -- Seymour Papert

Working...