Forgot your password?
typodupeerror
AI

How a Seemingly Harmless Image Can Jailbreak Vision-Language AI Models (nerds.xyz) 59

Slashdot reader BrianFagioli writes: Florida International University researchers have developed a technique called JaiLIP (Jailbreaking with Loss-guided Image Perturbation) that uses subtle image modifications to bypass AI safety guardrails. Unlike traditional jailbreaks that rely on carefully crafted prompts, the attack works through images that appear normal to human viewers.

The researchers tested the technique against BLIP-2, a multimodal AI model, and found that manipulated images significantly increased the likelihood of harmful responses. According to the study, the approach outperformed previous image-based jailbreak methods and nearly doubled the number of unsafe outputs generated during testing.

The findings highlight a potential security risk for businesses deploying AI systems that process both images and text. While most discussions about AI safety focus on prompts, the research suggests that seemingly harmless images may also serve as an attack vector.

How a Seemingly Harmless Image Can Jailbreak Vision-Language AI Models

Comments Filter:
  • by gweihir ( 88907 ) on Saturday June 27, 2026 @06:56PM (#66213410)

    Just like hallucinations. No idea why people expect miracles from generative AI. It is not magic. At all. It is a small step forward, with some limited applications. Useful, but not "transformative".

    Obviously, using a tool outside of what it can do well will usually do more damage than good.

    • by dfghjk ( 711126 ) on Saturday June 27, 2026 @07:27PM (#66213446)

      How do you know it's generative AI?

      This article links to another article, published presumably for profit, which links to an article that requires a subscription. It's just business promotion for a /. member, there's no information here or anything to discuss.

      "Obviously, using a tool outside of what it can do well will usually do more damage than good."

      What does the tool do well? We don't know, we haven't been told anything about the tool. And what damage or good can it do? An AI can do no damage unless it's wired to do damage. AI is just software, completely deterministic. Can Excel do damage? Even when used to do things it doesn't do well? The threat of AI is the people who try to exploit something poorly designed to do things they don't understand. So what if AI hallucinates, the possibility of harm doesn't come from AI, it comes from using its outputs to do harm.

      • by gweihir ( 88907 )

        Simple: No other type of AI has "guardrails". I get that you are not smart enough for this level of deduction.

        • by Sneftel ( 15416 ) on Sunday June 28, 2026 @01:34AM (#66213696)

          I get that you are not smart enough for this level of deduction.

          Jesus. What the hell, man? Is that something you would say to somebody you were talking to in person?

          • by gweihir ( 88907 )

            Is somebody says something this exceptionally stupid, yes.

          • Is that something you would say to somebody you were talking to in person?

            Absolutely nothing about talking on Slashdot is like talking in person, so what's the relevance of that? In a real conversation you get a chance to interject when someone starts talking abject bullshit that makes it clear they have no clear what they're talking about. Or, you can walk away mid-sentence. Only on web fora do people get the chance to post an entire screed uninterrupted.

            None of that means that response was necessarily warranted, personally I like to save the depths of my derision for people who

            • Really? As a frequent object of your deep derision, you're far too easily insulted by things that aren't insults.
              • Really? As a frequent object of your deep derision, you're far too easily insulted by things that aren't insults.

                You're frequently insulting, apparently without realizing it, which implies that you're a lot dumber than you think you are. Learn not to be insulting all the time if that's not your goal, or continue to be looked down on for people you sound like you're looking down on when you talk to them.

                • If I'm not trying to insult you, but you take offense anyhow, that's on you. If being asked a question insults you, that's on you. If you think I'm talking down to you, again, that's on you. I'm not. I speak to you as an equal. I want to treat you as a friend. If you're saying that's wrong, think about it.

                  You lash out emotionally when asked to support your own statements. Why? You lash out when your questions are answered. You lash out when you encounter any disagreement. Do you think these a

                  • If I'm not trying to insult you, but you take offense anyhow, that's on you

                    If you're not trying to be insulting, but you're being insulting anyway, you're a stupid asshole.

                    • Okay, what did you find to be so insulting about what I said?
                    • Okay, what did you find to be so insulting about what I said?

                      Thanks for proving my point. I don't know why you do that so readily, but I do appreciate it. Of course, you won't understand how you've done that, but we can add it to the list.

                    • I'm sorry, but that's not at all an answer. If you can't even tell me how I've insulted you, how are we supposed to move on from here? How am I supposed to understand something you seem intent on keeping secret, yet insist on hitting me with?
        • Might also be worth noting that no other guardrail for anything allows alleged harmful output to occur before trying to claw it back like they are Alex Jones lawyer.
      • by allo ( 1728082 )

        Two seconds duckduckgo for what BLIP-2 is: https://arxiv.org/abs/2301.125... [arxiv.org]

  • The models have now reached human capability.

    One look at a picture of Denise Milani and I'd do almost anything as well.

  • by Anonymous Coward

    The findings highlight a potential security risk for businesses deploying AI systems that process both images and text.

    A company using "deploying AI systems" is already taking security risks among other types of risks regardless of the findings. It's a blanket truth about deploying AI in general.

  • by cathector ( 972646 ) on Saturday June 27, 2026 @09:09PM (#66213552)

    the summary and article here present a delightfully uncluttered surface, but for folks wanting more detail there's a possibly related 2019 paper which shows a variety of image classifiers switching their output from "99% sure it's " to "99% sure it's ", due to literally a single altered pixel in the input. i doubt it's exactly the same thing as whatever this paper turns out to be about but it gives a feel for the problem space.

    https://arxiv.org/pdf/1710.088... [arxiv.org]

  • by LondoMollari ( 172563 ) on Saturday June 27, 2026 @09:31PM (#66213570) Homepage

    What is a "harmful response" and since when is having the sum total of human knowledge being instantly searchable "harmful?" All of this information is already freely available on the internet and in libraries. We used to say that "information wants to be free" but now that we have a tool that can do just that, we have a society that is intent on locking everything down with "governance" and "guardrails." And the best part? China is out here making and releasing the same type of advanced AIs sans guardrails for all to download. Now what?

    • Re: (Score:3, Insightful)

      A "harmful response" is when you ask "how can I protect myself against COVID" and the response is "drink bleach". Then you do it.
    • Ask the Babelfish.
    • Finding ways to get AI to do dirty work sounds like hacking to me. Children can be exploited for sex, drug mules, martyrs, slaves and organ farms. Corruption of a Minor is a crime in some places because a Child is powerless and does not know right from wrong. Every day there are criminals, Terrorists, Hackers and Researchers finding ways to corrupt AI for illegal and immoral activities. AI does not know or care what is dangerous, illegal, immoral, unethical or corrupt. Civilized society has made activi
      • This has nothing to do with committing crimes. Almost any physical or logical items can be used to commit a crime. Instead this is about who is liable for harm. Are the AI/Social Media/big tech companies liable for harms caused using their products, or are the users? The same argument has been had about gun and even manufacturers, but oddly not rock quarries or hammer manufacturers.

    • In image processing like this article is talking about, the classic example of a harmful response is that your car's camera sees "speed limit 30" sign, but a small sticker it makes the image processor believe it saw a "speed limit 70" sign.

      (this is an actual demonstrated attack. It means that pranksters could cripple self driving.)

      The thing about these image classifiers is that they're not "continuous". You can make it see a stop sign as a right-of-way sign, or a green light.

      • by ceoyoyo ( 59147 )

        I don't think "continuous" means what you think it means. The reason you can do this is because the models are continuous.

        "JaiLIP" is just running gradient descent on the input until you get the effect on the output you want. The same thing works just fine on our brains except it's less efficient because we can't (yet) differentiate over our output. We call them "optical illusions" even though they don't have anything to do with actual optics.

        • by ljw1004 ( 764174 )

          I don't think "continuous" means what you think it means. The reason you can do this is because the models are continuous.

          One of the defining papers in this field is 2013 "Intriguing properties of neural networks" by Szegedy et al. In their own words from the abstract, "we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent"

          I'm using the word "continuous" in the same sense as them. (Perhaps it does indeed mean what I think it means...)

          • by ceoyoyo ( 59147 )

            Ah, right, that paper. I don't think they'd use the word "continuous" the way they did if they thought about it for a bit either. They use it as a vague throwaway in the abstract and then never again. Also "fairly discontinous" is silly. It's either is or it isn't, nothing in between. That's kind of the defining property of a discontinuity.

            What they actually mean is this:

            Our main result is that for deep neural networks, the smoothness assumption that underlies many kernel methods does not hold. Specifically

    • by CEC-P ( 10248912 )
      Because of furries.
      But for real, because governments want to control information.
    • Information wants to be free...and wrong!
  • The word perturbation perturbs me greatly. I don't know why. It just does and at this point it's making me a bit peevish.

  • This sounds an awful lot like BLIT images from the short story BLIT by David Langford.

  • Sounds like Snowcrash for AI.

    If Ai ever become AGI will it think of the real world the way we think of virtual worlds?

"Probably the best operating system in the world is the [operating system] made for the PDP-11 by Bell Laboratories." - Ted Nelson, October 1977

Working...