Microsoft AI Engineer Says Company Thwarted Attempt To Expose DALL-E 3 Safety Problems (geekwire.com) 78
Todd Bishop reports via GeekWire: A Microsoft AI engineering leader says he discovered vulnerabilities in OpenAI's DALL-E 3 image generator in early December allowing users to bypass safety guardrails to create violent and explicit images, and that the company impeded his previous attempt to bring public attention to the issue. The emergence of explicit deepfake images of Taylor Swift last week "is an example of the type of abuse I was concerned about and the reason why I urged OpenAI to remove DALL-E 3 from public use and reported my concerns to Microsoft," writes Shane Jones, a Microsoft principal software engineering lead, in a letter Tuesday to Washington state's attorney general and Congressional representatives.
404 Media reported last week that the fake explicit images of Swift originated in a "specific Telegram group dedicated to abusive images of women," noting that at least one of the AI tools commonly used by the group is Microsoft Designer, which is based in part on technology from OpenAI's DALL-E 3. "The vulnerabilities in DALL-E 3, and products like Microsoft Designer that use DALL-E 3, makes it easier for people to abuse AI in generating harmful images," Jones writes in the letter to U.S. Sens. Patty Murray and Maria Cantwell, Rep. Adam Smith, and Attorney General Bob Ferguson, which was obtained by GeekWire. He adds, "Microsoft was aware of these vulnerabilities and the potential for abuse."
Jones writes that he discovered the vulnerability independently in early December. He reported the vulnerability to Microsoft, according to the letter, and was instructed to report the issue to OpenAI, the Redmond company's close partner, whose technology powers products including Microsoft Designer. He writes that he did report it to OpenAI. "As I continued to research the risks associated with this specific vulnerability, I became aware of the capacity DALL-E 3 has to generate violent and disturbing harmful images," he writes. "Based on my understanding of how the model was trained, and the security vulnerabilities I discovered, I reached the conclusion that DALL-E 3 posed a public safety risk and should be removed from public use until OpenAI could address the risks associated with this model."
On Dec. 14, he writes, he posted publicly on LinkedIn urging OpenAI's non-profit board to withdraw DALL-E 3 from the market. He informed his Microsoft leadership team of the post, according to the letter, and was quickly contacted by his manager, saying that Microsoft's legal department was demanding that he delete the post immediately, and would follow up with an explanation or justification. He agreed to delete the post on that basis but never heard from Microsoft legal, he writes. "Over the following month, I repeatedly requested an explanation for why I was told to delete my letter," he writes. "I also offered to share information that could assist with fixing the specific vulnerability I had discovered and provide ideas for making AI image generation technology safer. Microsoft's legal department has still not responded or communicated directly with me." "Artificial intelligence is advancing at an unprecedented pace. I understand it will take time for legislation to be enacted to ensure AI public safety," he adds. "At the same time, we need to hold companies accountable for the safety of their products and their responsibility to disclose known risks to the public. Concerned employees, like myself, should not be intimidated into staying silent." The full text of Jones' letter can be read here (PDF).
404 Media reported last week that the fake explicit images of Swift originated in a "specific Telegram group dedicated to abusive images of women," noting that at least one of the AI tools commonly used by the group is Microsoft Designer, which is based in part on technology from OpenAI's DALL-E 3. "The vulnerabilities in DALL-E 3, and products like Microsoft Designer that use DALL-E 3, makes it easier for people to abuse AI in generating harmful images," Jones writes in the letter to U.S. Sens. Patty Murray and Maria Cantwell, Rep. Adam Smith, and Attorney General Bob Ferguson, which was obtained by GeekWire. He adds, "Microsoft was aware of these vulnerabilities and the potential for abuse."
Jones writes that he discovered the vulnerability independently in early December. He reported the vulnerability to Microsoft, according to the letter, and was instructed to report the issue to OpenAI, the Redmond company's close partner, whose technology powers products including Microsoft Designer. He writes that he did report it to OpenAI. "As I continued to research the risks associated with this specific vulnerability, I became aware of the capacity DALL-E 3 has to generate violent and disturbing harmful images," he writes. "Based on my understanding of how the model was trained, and the security vulnerabilities I discovered, I reached the conclusion that DALL-E 3 posed a public safety risk and should be removed from public use until OpenAI could address the risks associated with this model."
On Dec. 14, he writes, he posted publicly on LinkedIn urging OpenAI's non-profit board to withdraw DALL-E 3 from the market. He informed his Microsoft leadership team of the post, according to the letter, and was quickly contacted by his manager, saying that Microsoft's legal department was demanding that he delete the post immediately, and would follow up with an explanation or justification. He agreed to delete the post on that basis but never heard from Microsoft legal, he writes. "Over the following month, I repeatedly requested an explanation for why I was told to delete my letter," he writes. "I also offered to share information that could assist with fixing the specific vulnerability I had discovered and provide ideas for making AI image generation technology safer. Microsoft's legal department has still not responded or communicated directly with me." "Artificial intelligence is advancing at an unprecedented pace. I understand it will take time for legislation to be enacted to ensure AI public safety," he adds. "At the same time, we need to hold companies accountable for the safety of their products and their responsibility to disclose known risks to the public. Concerned employees, like myself, should not be intimidated into staying silent." The full text of Jones' letter can be read here (PDF).
"Public Safety Risk" (Score:5, Insightful)
Re: "Public Safety Risk" (Score:1)
Re: (Score:2)
But does she?
Exactly nobody thinks these are real pictures of her, so can it really be defamatory?
They are not actual photo's taken of her by anyone, so where is the copyright violation?
Asserting personal image rights, right of publicity etc usually has to have some commercial component. As far as I am aware nobody is selling these images, nobody is using them to promote any event or product.
Harassment and Assault of any kind on the criminal side of the legal-house also seem like a stretch here. You can't
Re: (Score:2)
I'd be willing to bet, that the actual Swift "face" was from an actual photo taken, and used to face-swap on the generated image.
So far as I've explored with AI image generation, that seems to still be the most accurate way to get the face to look real, and be repeatable in general.
Re: (Score:3)
The phrase "Public Safety Risk" seems to be redefined from the usual meaning. Here it does not mean anybody is killed, maimed, or physically injured, but apparently is being used to mean "The AI tool can be made to generate pornographic pictures."
Public safety risk -> Corporate embarrassment risk
Seriously, most corporations would put their own desire to not be publicly embarrassed above pretty much anything, including public safety. Just look at Boeing. I'm surprised Microsoft hasn't started the hatchet job to destroy this guy's reputation yet for daring to speak out.
Re: "Public Safety Risk" (Score:3)
Re: (Score:2)
Indeed, but what can you expect from a Microsoft engine man. (train analogy)
Re: (Score:2)
He really has no basis to request they shut it down. This doesnâ(TM)t happen for other bugs either. He found a hole and they should get 90 days to fix it before he discloses.
Anybody can request anything. They're free to roll their eyes and say "No", of course.
Re: (Score:2)
He found a hole and they should get 90 days to fix it before he discloses.
Indeed, it seems he found several holes.
Re:"Public Safety Risk" (Score:5, Insightful)
Some people who've never experienced violence call mild annoyances "violence".
It really demeans the experiences many people have lived through including genocide and worse. I think they have no Theory of Mind.
But it's also super weird to market an art tool then freak out when the art tool can make nudes. Not only do paint brushes not engage in thought policing, all adult art students do a course in nudes. The democratization seems to be bugging them quite a bit.
I guess it's time for an ole ( o ) ( o ) to really get them going.
Re: (Score:1)
are they, really?
Re:"Public Safety Risk" (Score:4, Insightful)
It's kind of like the commonly cited statistics such as "Nationwide, 81% of women and 43% of men reported experiencing some form of sexual harassment and/or assault in their lifetime." https://www.nsvrc.org/statisti... [nsvrc.org] Lumping together sexual harassment with sexual assault makes no sense, other than to inflate the statistics and make things sound a whole lot worse than they are.
Re: (Score:2)
43% of men?
Hard to believe that one..as the old saying goes:
"You can't rape the willing"
(unless it was a gay guy raping men I guess...that didn't used to really be a consideration a few years ago in conversations like these....hetero was always assumed).
Re: (Score:2)
Re: (Score:2)
Really? At what point should a woman start worrtying - when you're whistling at her, or when you start following her, walking down the street?
Re: (Score:2)
I'd say that following is much more serious than whistling, for sure. Both are inappropriate, but both are not sexual assault.
In real life, it's much more common for people to playfully banter at work. Some people like it, but others consider it sexual harassment. Even a comment such as "You look gorgeous today!" can be construed as sexual harassment, and can be included in that 81%.
Worrying behavior does not equal violence.
Re: (Score:2)
Re: (Score:2)
At no point when that woman is Taylor Swift with her own security that will kick your head in, if you actually tried any thing.
This is I think a significant aspect to this story. As far as celebrity status, she is about as big as you can get. Her experience with this isn't like that of the rest of the population.
Re:"Public Safety Risk" (Score:4, Informative)
If I broke your arm, calling that an act of violence wouldn't "demean the experiences" of people who have been tortured to death.
That's a silly argument. Intentionally breaking someone's arm is clearly violence.
Publishing a nude photo is not violence.
It might be rude, offensive, harassing, and illegal. But it is not "violence".
Re: (Score:3)
Anything not fluffy clouds and funny cats is "violence" nowadays.
Re: (Score:2)
Are all right-wingers illiterate?
Intentionally breaking someone's arm is clearly violence.
Yes, it is. That's the point.
Re: (Score:2)
If I broke your arm, calling that an act of violence wouldn't "demean the experiences" of people who have been tortured to death.
This is a dishonest argument. Breaking somebody's arm is is an act of violence, and calling it so is correct. But the discussion is about nude photos, not arm breaking. Calling nude photos "acts of violence" is demeaning.
The equivalent fallacy would be for you to hyperbolically call your arm breaking "genocide" or "a crime against humanity". That would be demeaning to real victims of genocide.
Re: (Score:2)
Breaking somebody's arm is is an act of violence, and calling it so is correct.
Yes. That's point.
Did you even read the parent's post? He's not making the argument you think he's making.
This is a dishonest argument.
The only thing dishonest here is your characterization of my post and the parents post.
The equivalent fallacy
Oh, you're one of those. Ugh... read the link in my .sig.
Re: (Score:2)
The models in art classes are consenting.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
There's a difference between saying you fuck kids and saying you have a naked body. Hell, even an image of Taylor Swift fucking a 4 year old may or may not be defamatory, because a statement has to be believable in order to be liable. Also, as a public figure, the bar is much higher for Taylor Swift than it would be for a random person off the street.
Re: "Public Safety Risk" (Score:3, Insightful)
Any idiot who thinks they can "remove DALLE 3 to save the public" is an idiot who hasn't heard of offline Stable Diffusion models. The genie is out of the bottle.
Re: (Score:2)
There are some out there who generate fake pornography with someone else's face on it, and threaten to send it to the victim's family/colleagues/friendly local law enforcement officer/etc. If the victim pays a small sum, the fake media are (maybe) not sent.
It's called blackmail, and if you think victims will just shrug it off as a prank you're seriously mistaken. Sometimes, a picture, even if known to
Re: (Score:2)
What you are describing though is what you called it blackmail, it might also be forgery or uttering; but what is relevant is there are intents to deceive and extort.
What we have here is a "Hi I made nude drawing of Taylor Swift on some computers, wanna see!"
I am not saying I think it is good, I don't know with any certainty if its currently a violation of law or some civil construct like image rights but I know what it isn't: blackmail, rape, *-assault, and many of the other things people are running aroun
Re: (Score:2)
Believe it or not, that's a clear public safety risk, ie threatening all people going about their lives.
"Safety risk" is an inherently unfalsifiable term. There is nothing that cannot be construed as a safety risk.
There are some out there who generate fake pornography with someone else's face on it, and threaten to send it to the victim's family/colleagues/friendly local law enforcement officer/etc. If the victim pays a small sum, the fake media are (maybe) not sent.
It's called blackmail, and if you think victims will just shrug it off as a prank you're seriously mistaken. Sometimes, a picture, even if known to be fake, can sow the seeds of fatal doubt in a relationship.
You don't even need to have fake photos. You can merely assert you do and the scenario described would still be valid. Language itself is a "safety risk".
Like if your boss sees visual "proof" that you're a pedo. He might not believe it himself, but he'll still fire you because he can't afford to explain to every potential customer "no, our employee XXongo is not a pedo, probably, with high likelihood, so you can absolutely do business with us". Or if your insecure girlfriend is shown anonymously what you're doing every wednesday night when you claim to be working late.
Defamation is already illegal. Ditto for whacking people with baseball bats and telling the judge you were merely trying to repel zombies.
Re: (Score:2)
So, you're one of those generating deepfakes? I guess you can't even pay a prostitute, none of them want you.
Fine (Score:1)
Fix it yourself.
Re: (Score:2)
How dare he complain about someone else's criminal behavior, amirite? Don't like the holocaust, fix it yourself!
Re: (Score:2)
I don't think he called their behavior criminal. But hey, I actually read the open letter so what do I know.
Re: (Score:2)
Feel free to change my mind. You could start by reading the article, follow it up by reading the letter, and then posting why you hold your beliefs.
ugh (Score:3)
Does the Shane Jones think the company needs to try and stop me from drawing unsafe images in Windows Paint? What an idiot.
It's called prompt hacking (Score:1)
Re:It's called prompt hacking (Score:5, Insightful)
No need to take the tool offline, lots of people use DALLE for valid purposes. Fix that particular hack, ban the account that did it and move on.
It can't be fixed, anymore than you could stop people from doing this with Photoshop before AI existed. Guardrails, right LOL. AI will totally have guardrails that nothing else does. You can't keep the internet from being used for bad things. You can't keep kitchen knives from being used to kill people, or spray paint being used for graffiti. It is a tool, and will be used for good and bad. The best defense is simply understanding that, and we are a long way from that..
Re: (Score:2)
"only a good guy with ai can stop a bad guy with ai"
You can't keep kitchen knives from being used to kill people
but in a way we literally have [statista.com]
Not doubting the stats but I don't think better guardrails on kitchen knives were really part of that.
Re: (Score:1)
Didn't stop us from seeing Taylor Swift naked (Score:2)
Re: (Score:2, Funny)
How did they do the Taylor fakes?
So far I've found those images difficult to accidentally stumble across, but for safety reasons, could you please tell me where they're located?
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
That's one way of doing it...but one could always use the open source Stable Diffusion models on a local computer.
Heck, you can set it up on a Google python lab site and run it....no guard rails at all.
I'm guessing likely whatever they used, they first generated the female bodies in compromising positions and then face swapped Swift in from real photos found most anywhere on the internet.
Re: (Score:1)
Re: (Score:2)
How did they do the Taylor fakes? Are there now private instances of stuff like Dall-E and Midjourney that have more or less equal power to create imagery? I thought it still took a data center full of AI transputers.
Using open source SD models creating a LoRA of a person requires just dozen(s) of images and at worst a few hours on an old GPU. With a high end gaming GPU training takes minutes.
I can go out to lunch and have literally hundreds of images waiting for me when I get back on a single workstation. System requirements for the image generators are far lower than LLMs due to relatively small model size in the 2 - 6 GB range and correspondingly low VRAM usage.
Safe Space (Score:5, Insightful)
Safety seems to now mean being in a padded room where no one can say anything you might find mildly troubling.
meh (Score:2)
Bill Gates pictures, no, Taylor Swift naked yes? (Score:2, Interesting)
Just a few weeks ago, I asked Bing AI to draw me a painting of Bill Gates sitting on a park bench eating an ice cream cone, and it told me that it refused to draw the image because it "might be used for harm".
I find it amusing that Microsoft had protections in place to protect it's former Chairman and CEO from being used in a meme, but apparently nothing for user generated Taylor Swift porn. Kinda shows where their priorities lie.
Re:Bill Gates pictures, no, Taylor Swift naked yes (Score:5, Informative)
Get Over It (Score:1)
Re:Get Over It (Score:5, Insightful)
I've found the the people who complain that other people need to "grow thicker skin" just want to justify or excuse their own shitty behavior. They're also the ones who cry the loudest when they're on the receiving end.
Lets focus on things that really matter
Just because something doesn't matter to you doesn't mean it doesn't matter.
We've been down this road countless times before. The typical conservative response is to trivialize the problem until they're personally affected, then cry like a toddler. It's predictable and boring.
Re: (Score:2)
Re: (Score:1)
Re: (Score:3)
I've found the the people who complain that other people need to "grow thicker skin" just want to justify or excuse their own shitty behavior. They're also the ones who cry the loudest when they're on the receiving end.
Both of you over-generalize.
Just because something doesn't matter to you doesn't mean it doesn't matter.
Again, both of you over-generalize as well as paint pictures in either white, or black.
We've been down this road countless times before. The typical conservative response is to trivialize the problem until they're personally affected, then cry like a toddler.
Um, isn't this exactly what is happening in this case? Deepfakes have existed for a long time, and good computer-generated deepfakes have been around for at least 5 years, with hundreds and hundreds of celebrities' images and videos being used to generate fake porn or whatever.
When it was Natalie Portman, or that actress who plays Rey in Star Wars, there was no official reaction. But now it's a
Re: (Score:2)
Safety problems... (Score:3)
In one of his (multiple) autobiographies, Leonard Nimoy talks about looking at Star Trek fan art and seeing some "extremely realistic" paintings of himself cavorting in the nude. Usually with Captain Kirk. He thought it was pretty amusing.
Ain't nothing new here, except that the requirement for artistic talent has been removed.
Yes, I do get it that this sort of thing might seem a whole lot less amusing to the subject, when the subject is a woman. But to describe it as a "safety" issue? Eff off.
So MS is the Boeing of AI? (Score:2)
Just use it to rig up a vid of the top MS executives having hot monkey sex with each other, and they'll finally crack down (no pun intended).
Re: (Score:1)
They host it, they get the blame.
Pencils are next! (Score:2)
I had a child safety issue actually (Score:5, Interesting)
My initial reaction was that the engineer sounded like he was losing his mind, and what a nightmare for the company. I didn't read the post but it sounds more like "evil users can make evil drawings" kind of thing, which is true for pencil and paper too.
But, I also remembered an experience I had myself. At risk of a "think of the children" instance, I think it raises a valid point but not one to be legally enforced.
My niece, about 7 years old or so loves owls and I generated owl images with her using one of the AI image generation sites, Dall-e or something else which lets you type in a phrase then it displays some images. Everything was going fine until she demanded to be able to type. What could go wrong, I thought. Immediately she pounded the keyboard with a mischievous grin, making a string of maybe 50 characters looking like a long nonsense word, lots of consonants I think.
The resulting image looked like the complete opposite of anything wholesome, showing a severely cut up, damaged, bleeding corpse that caused her to shriek and duck her head. Basically nightmare fuel. It made me wonder how it was caused. Was there an underlying algorithmic design issue that resulted in abnormal images being hidden in spaces not accessible with anything but nonsense strings? Maybe a string that was disallowed as proper input ended up gathering lots of negative descriptors?
I think it would be a good idea for the possibility of such an input to cause this kind of output, and recommend that children not be allowed to use them without supervision. I don't know if this problem still exists, as it was maybe a year or two ago, but probably it has happened to other people. It still doesn't warrant pulling the code / service off the web.
Re: (Score:2)
A pencil (Score:2)
A pencil creates art or is used to stab someone in the eye. BAN PENCILS!
Can I still imagine what she looks like nude? (Score:2)