How a Seemingly Harmless Image Can Jailbreak Vision-Language AI Models (nerds.xyz) 59

Posted by EditorDavid on Saturday June 27, 2026 @06:52PM from the double-visions dept.

Slashdot reader BrianFagioli writes: Florida International University researchers have developed a technique called JaiLIP (Jailbreaking with Loss-guided Image Perturbation) that uses subtle image modifications to bypass AI safety guardrails. Unlike traditional jailbreaks that rely on carefully crafted prompts, the attack works through images that appear normal to human viewers.

The researchers tested the technique against BLIP-2, a multimodal AI model, and found that manipulated images significantly increased the likelihood of harmful responses. According to the study, the approach outperformed previous image-based jailbreak methods and nearly doubled the number of unsafe outputs generated during testing.

The findings highlight a potential security risk for businesses deploying AI systems that process both images and text. While most discussions about AI safety focus on prompts, the research suggests that seemingly harmless images may also serve as an attack vector.

How a Seemingly Harmless Image Can Jailbreak Vision-Language AI Models

Post Load All Comments

Search 59 Comments Log In/Create an Account

Comments Filter:

Jailbreaking will never get fixed (Score:4, Insightful)

by gweihir ( 88907 ) writes: on Saturday June 27, 2026 @06:56PM (#66213410)

Just like hallucinations. No idea why people expect miracles from generative AI. It is not magic. At all. It is a small step forward, with some limited applications. Useful, but not "transformative".
Obviously, using a tool outside of what it can do well will usually do more damage than good.

Reply to This Share
Flag as Inappropriate
- Re:Jailbreaking will never get fixed (Score:5, Interesting)
  
  by dfghjk ( 711126 ) writes: on Saturday June 27, 2026 @07:27PM (#66213446)
  
  How do you know it's generative AI?
  This article links to another article, published presumably for profit, which links to an article that requires a subscription. It's just business promotion for a /. member, there's no information here or anything to discuss.
  "Obviously, using a tool outside of what it can do well will usually do more damage than good."
  What does the tool do well? We don't know, we haven't been told anything about the tool. And what damage or good can it do? An AI can do no damage unless it's wired to do damage. AI is just software, completely deterministic. Can Excel do damage? Even when used to do things it doesn't do well? The threat of AI is the people who try to exploit something poorly designed to do things they don't understand. So what if AI hallucinates, the possibility of harm doesn't come from AI, it comes from using its outputs to do harm.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: (Score:1)
    
    by gweihir ( 88907 ) writes:
    
    Simple: No other type of AI has "guardrails". I get that you are not smart enough for this level of deduction.
    - Re: Jailbreaking will never get fixed (Score:4)
      
      by Sneftel ( 15416 ) writes: on Sunday June 28, 2026 @01:34AM (#66213696)
      
      I get that you are not smart enough for this level of deduction.
      Jesus. What the hell, man? Is that something you would say to somebody you were talking to in person?
      
      Reply to This Parent Share
      Flag as Inappropriate
      - Re: (Score:1)
        
        by gweihir ( 88907 ) writes:
        
        Is somebody says something this exceptionally stupid, yes.
      - Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        Is that something you would say to somebody you were talking to in person?
        Absolutely nothing about talking on Slashdot is like talking in person, so what's the relevance of that? In a real conversation you get a chance to interject when someone starts talking abject bullshit that makes it clear they have no clear what they're talking about. Or, you can walk away mid-sentence. Only on web fora do people get the chance to post an entire screed uninterrupted.
        None of that means that response was necessarily warranted, personally I like to save the depths of my derision for people who
        
        Re: (Score:2)
        
        by sabbede ( 2678435 ) writes:
        
        Really? As a frequent object of your deep derision, you're far too easily insulted by things that aren't insults.
        
        Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        Really? As a frequent object of your deep derision, you're far too easily insulted by things that aren't insults.
        You're frequently insulting, apparently without realizing it, which implies that you're a lot dumber than you think you are. Learn not to be insulting all the time if that's not your goal, or continue to be looked down on for people you sound like you're looking down on when you talk to them.
        
        Re: (Score:2)
        
        by sabbede ( 2678435 ) writes:
        
        If I'm not trying to insult you, but you take offense anyhow, that's on you. If being asked a question insults you, that's on you. If you think I'm talking down to you, again, that's on you. I'm not. I speak to you as an equal. I want to treat you as a friend. If you're saying that's wrong, think about it.
        You lash out emotionally when asked to support your own statements. Why? You lash out when your questions are answered. You lash out when you encounter any disagreement. Do you think these a
        
        Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        If I'm not trying to insult you, but you take offense anyhow, that's on you
        If you're not trying to be insulting, but you're being insulting anyway, you're a stupid asshole.
        
        Re: (Score:2)
        
        by sabbede ( 2678435 ) writes:
        
        Okay, what did you find to be so insulting about what I said?
        
        Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        Okay, what did you find to be so insulting about what I said?
        Thanks for proving my point. I don't know why you do that so readily, but I do appreciate it. Of course, you won't understand how you've done that, but we can add it to the list.
        
        Re: (Score:2)
        
        by sabbede ( 2678435 ) writes:
        
        I'm sorry, but that's not at all an answer. If you can't even tell me how I've insulted you, how are we supposed to move on from here? How am I supposed to understand something you seem intent on keeping secret, yet insist on hitting me with?
    - Re: Jailbreaking will never get fixed (Score:2)
      
      by Anonymous Cward ( 10374574 ) writes:
      
      Might also be worth noting that no other guardrail for anything allows alleged harmful output to occur before trying to claw it back like they are Alex Jones lawyer.
  - Re: (Score:2)
    
    by allo ( 1728082 ) writes:
    
    Two seconds duckduckgo for what BLIP-2 is: https://arxiv.org/abs/2301.125... [arxiv.org]
- - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    No. That function is not AI driven. But you can drive people into bankruptcy: https://genai.owasp.org/llmris... [owasp.org]
Turing Test Completed (Score:2)

by PPH ( 736903 ) writes:

The models have now reached human capability.
One look at a picture of Denise Milani and I'd do almost anything as well.
- Re: (Score:2)
  
  by TheMiddleRoad ( 1153113 ) writes:
  
  She does fit into that AI generated uncanny valley with that fake chest.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  No. Well, yes, but only for incapable and insightless humans, such as you.
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  What happens with pics of the hypno toad?
Any business deploying AI is taking a risk (Score:1)

by Anonymous Coward writes:

The findings highlight a potential security risk for businesses deploying AI systems that process both images and text.
A company using "deploying AI systems" is already taking security risks among other types of risks regardless of the findings. It's a blanket truth about deploying AI in general.
single pixel attacks (Score:3)

by cathector ( 972646 ) writes: on Saturday June 27, 2026 @09:09PM (#66213552)

the summary and article here present a delightfully uncluttered surface, but for folks wanting more detail there's a possibly related 2019 paper which shows a variety of image classifiers switching their output from "99% sure it's " to "99% sure it's ", due to literally a single altered pixel in the input. i doubt it's exactly the same thing as whatever this paper turns out to be about but it gives a feel for the problem space.
https://arxiv.org/pdf/1710.088... [arxiv.org]

Reply to This Share
Flag as Inappropriate
- Re: single pixel attacks (Score:5)
  
  by cathector ( 972646 ) writes: on Saturday June 27, 2026 @09:12PM (#66213554)
  
  wups, /. filtered out my angle-brackets.
  should read: ... from "99% sure it's (the right thing)" to "99% sure it's (something not even close to the right thing)" ...
  
  Reply to This Parent Share
  Flag as Inappropriate
- Re: (Score:2)
  
  by TheMiddleRoad ( 1153113 ) writes:
  
  They're just statistical engines. This is not a surprise.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Indeed. They can make guesses, but they cannot verify those and the guess may be so far off as to be utterly ridiculous.
  - Re: (Score:2)
    
    by allo ( 1728082 ) writes:
    
    The statement is as useful as saying your science book is just letters (and implying it could not contain useful science because it is just a collection of letters).
    - Re: (Score:2)
      
      by ceoyoyo ( 59147 ) writes:
      
      What are letters? Books are just splotches of dark ink on light paper.
    - Re: (Score:2)
      
      by TheMiddleRoad ( 1153113 ) writes:
      
      Sorry, but your AI agent is not actually your friend, no matter how great it says you are.
      - Re: (Score:2)
        
        by allo ( 1728082 ) writes:
        
        Posted in the wrong thread?
What is a "harmful response?" (Score:5, Insightful)

by LondoMollari ( 172563 ) writes: on Saturday June 27, 2026 @09:31PM (#66213570) Homepage

What is a "harmful response" and since when is having the sum total of human knowledge being instantly searchable "harmful?" All of this information is already freely available on the internet and in libraries. We used to say that "information wants to be free" but now that we have a tool that can do just that, we have a society that is intent on locking everything down with "governance" and "guardrails." And the best part? China is out here making and releasing the same type of advanced AIs sans guardrails for all to download. Now what?

Reply to This Share
Flag as Inappropriate
- Re: (Score:3, Insightful)
  
  by martin-boundary ( 547041 ) writes:
  
  A "harmful response" is when you ask "how can I protect myself against COVID" and the response is "drink bleach". Then you do it.
  - Re: What is a "harmful response?" (Score:1)
    
    by RightwingNutjob ( 1302813 ) writes:
    
    If you do, it isn't harm. It's curating the gene pool.
  - Re: (Score:2)
    
    by LondoMollari ( 172563 ) writes:
    
    Then the user is pretty dumb, aren't they?
    - Re: (Score:2)
      
      by martin-boundary ( 547041 ) writes:
      
      Then the user is pretty dumb, aren't they?
      
      What if the user is 7 years old?
      - Re: (Score:1)
        
        by tilk ( 637557 ) writes:
        
        Then the parents are pretty dumb, aren't they?
  - Re: (Score:1)
    
    by cascadingstylesheet ( 140919 ) writes:
    
    Then you do it.
    I think we've identified the harmful bit right there.
  - Re: (Score:1)
    
    by Tablizer ( 95088 ) writes:
    
    That's FoxGPT
- Re: (Score:2)
  
  by TheMiddleRoad ( 1153113 ) writes:
  
  Ask the Babelfish.
- Re: (Score:3)
  
  by classiclantern ( 2737961 ) writes:
  
  Finding ways to get AI to do dirty work sounds like hacking to me. Children can be exploited for sex, drug mules, martyrs, slaves and organ farms. Corruption of a Minor is a crime in some places because a Child is powerless and does not know right from wrong. Every day there are criminals, Terrorists, Hackers and Researchers finding ways to corrupt AI for illegal and immoral activities. AI does not know or care what is dangerous, illegal, immoral, unethical or corrupt. Civilized society has made activi
  - Re: (Score:2)
    
    by kwelch007 ( 197081 ) writes:
    
    This has nothing to do with committing crimes. Almost any physical or logical items can be used to commit a crime. Instead this is about who is liable for harm. Are the AI/Social Media/big tech companies liable for harms caused using their products, or are the users? The same argument has been had about gun and even manufacturers, but oddly not rock quarries or hammer manufacturers.
- Re: What is a "harmful response?" (Score:3)
  
  by ljw1004 ( 764174 ) writes:
  
  In image processing like this article is talking about, the classic example of a harmful response is that your car's camera sees "speed limit 30" sign, but a small sticker it makes the image processor believe it saw a "speed limit 70" sign.
  (this is an actual demonstrated attack. It means that pranksters could cripple self driving.)
  The thing about these image classifiers is that they're not "continuous". You can make it see a stop sign as a right-of-way sign, or a green light.
  - Re: (Score:2)
    
    by ceoyoyo ( 59147 ) writes:
    
    I don't think "continuous" means what you think it means. The reason you can do this is because the models are continuous.
    "JaiLIP" is just running gradient descent on the input until you get the effect on the output you want. The same thing works just fine on our brains except it's less efficient because we can't (yet) differentiate over our output. We call them "optical illusions" even though they don't have anything to do with actual optics.
    - Re: (Score:2)
      
      by ljw1004 ( 764174 ) writes:
      
      I don't think "continuous" means what you think it means. The reason you can do this is because the models are continuous.
      One of the defining papers in this field is 2013 "Intriguing properties of neural networks" by Szegedy et al. In their own words from the abstract, "we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent"
      I'm using the word "continuous" in the same sense as them. (Perhaps it does indeed mean what I think it means...)
      - Re: (Score:2)
        
        by ceoyoyo ( 59147 ) writes:
        
        Ah, right, that paper. I don't think they'd use the word "continuous" the way they did if they thought about it for a bit either. They use it as a vague throwaway in the abstract and then never again. Also "fairly discontinous" is silly. It's either is or it isn't, nothing in between. That's kind of the defining property of a discontinuity.
        What they actually mean is this:
        Our main result is that for deep neural networks, the smoothness assumption that underlies many kernel methods does not hold. Specifically
- Re: (Score:2)
  
  by CEC-P ( 10248912 ) writes:
  
  Because of furries.
  But for real, because governments want to control information.
- Re: (Score:2)
  
  by frenchgates ( 531731 ) writes:
  
  Information wants to be free...and wrong!
Language (Score:2)

by SlashbotAgent ( 6477336 ) writes:

The word perturbation perturbs me greatly. I don't know why. It just does and at this point it's making me a bit peevish.
BLIT by David Langford (Score:1)

by LaughingRadish ( 2694765 ) writes:

This sounds an awful lot like BLIT images from the short story BLIT by David Langford.
Neal Stephenson would be proud (Score:1)

by vivian ( 156520 ) writes:

Sounds like Snowcrash for AI.
If Ai ever become AGI will it think of the real world the way we think of virtual worlds?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

How a Seemingly Harmless Image Can Jailbreak Vision-Language AI Models More | Reply Login

Jailbreaking will never get fixed (Score:4, Insightful)

Re:Jailbreaking will never get fixed (Score:5, Interesting)

Re: (Score:1)

Re: Jailbreaking will never get fixed (Score:4)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Jailbreaking will never get fixed (Score:2)

Re: (Score:2)

Re: (Score:2)

Turing Test Completed (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Any business deploying AI is taking a risk (Score:1)

single pixel attacks (Score:3)

Re: single pixel attacks (Score:5)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

What is a "harmful response?" (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: What is a "harmful response?" (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: What is a "harmful response?" (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Language (Score:2)

BLIT by David Langford (Score:1)

Neal Stephenson would be proud (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals