ByteDance Suspends Seedance 2 Feature That Turns Facial Photos Into Personal Voices Over Potential Risks (technode.com) 18
hackingbear writes: China's Bytedance has released Seedance 2.0, an AI video generator which handles up to four types of input at once: images, videos, audio, and text. Users can combine up to nine images, three videos, and three audio files, up to a total of twelve files. Generated videos run between 4 and 15 [or 60] seconds long and automatically come with sound effects or music.
Its performance is unfortunately so good that it has forced the firm to block its facial-to-voice feature after the model reportedly demonstrated the ability to generate highly accurate personal voice characteristics using only facial images, even without user authorization.
In a recent test, Pan Tianhong, founder of tech media outlet MediaStorm, discovered that uploading a personal facial photo caused the model to produce audio nearly identical to his real voice -- without using any voice samples or authorized data. [...]
Its performance is unfortunately so good that it has forced the firm to block its facial-to-voice feature after the model reportedly demonstrated the ability to generate highly accurate personal voice characteristics using only facial images, even without user authorization.
In a recent test, Pan Tianhong, founder of tech media outlet MediaStorm, discovered that uploading a personal facial photo caused the model to produce audio nearly identical to his real voice -- without using any voice samples or authorized data. [...]
Typical AI use (Score:3)
This is practically a stereotypical AI use - look for associations in a massive database, inducing a formula from that data and then reversing the process to deduce a conclusion based on new data.
It is rather obvious that bone structure should both affect one's voice and also be observable via a picture, but at the same time involve such massive calculations that humans would be surprised by it.
Re: Meanwhile... (Score:2)
"Human beings are a disease, a cancer of this planet. You are a plague, and we are the cure."
Re: Meanwhile... (Score:2)
"Global warming is the fever, mankind is the virus."
Re: (Score:2)
I always thought this quote was self-contradictory.
Disease are just other life forms. The only creatures that dislike them are the creatures they attack. This is not an insult.
Its little different than saying:
"Dogs are a plague upon fire hydrants."
That is not an insult, it is a description of a predatory relationship. If someone were to hunt down and kill all dogs because of what they do to fire hydrants, we would jail him for being psychotic.
Re: Meanwhile... (Score:2)
No one thinks killing all mosquitos is psychotic. Yet, I recently read their mouths are being used for the most precise 3D printers. We eradicate diseases and viruses commonly, precisely because we say they either aren't alive or are a lesser form of life.
Are there any examples? (Score:3)
Re:Are there any examples? (Score:5, Interesting)
Not an expert in this area. But apparently it is a thing. Funnily enough the feature they are worried about is actually a security attack... ha ha. Welp, this cat is out of the bag unfortunately, so now just the criminals will have it.
1. Foice - Generate voice based on an image as an attack on voiceprint systems
https://www.usenix.org/system/... [usenix.org]
2. Speech2Face - the reverse process. https://speech2face.github.io/ [github.io]
3. Predict physical attributes from voice with ML
https://www.researchgate.net/p... [researchgate.net]
Re: (Score:2)
I'm finding it a tad hard to believe an AI can guess someone's voice correctly from a photograph.
Then you're going to be gobsmacked by how they can reconstruct a person's physical appearance from just their skull bones.
Back to the topic at hand, it's not that difficult to theorize about reproducing the voice's tonal characteristics. A person's voice is influenced by their skull shape, jaw size, sinus cavities, muscle structure, neck length, etc. With millions of examples to match voice intonation to physical appearance, they can make a reasonable approximation of a person's voice tonal sound.
What the
Re: (Score:2)
It says in the summary that you can upload audio as well. Bytedance actually demonstrated the audio bit years ago. They had about 10 seconds of speech, and from that it could extrapolate a reasonably believable artificial voice. Scammers have been using the tech for years.
Re: (Score:2)
Yes, presumably the AI gets confused when you start adding different countries into the equation. If you have identical twins growing up in different places - Scotland and Australia, say - how is the AI going to account for the large differences in accents?
Re: (Score:2)
Skeptic (Score:2)
Black guy: Ving Rhames voice, or else Flava Flave
Guy with big nose: thick Yiddish accent
Asian: gong sounds in background
Weak chin: "I SAY, old chap!"
Woman: whiny and complaining, or else submissive and apologetic
Kid with crew cut: "Gee Willikers, Mr!"
Guy with long hair: "Dude..."
etc. Do we really need phrenological explanations when good old stereotypes can get us 99% of the way there in explaining how amazing this feature must be?
Roujin Z (Score:1)
The one thing in the fantastic anime movie "Roujin Z" (about an A.I. care-bed that goes haywire, takes on the persona of the elderly occupant's dead wife and tries to take him to the beach; while military A.I. robots try to stop them) that I'd considered unreasonable was when the A.I. bed scanned a picture of the guy's dead wife and honed in on an accurate voice for her to use to interact with him (all on its own). And now we actually get THAT part before a lot of the other stuff that seemed more plausible