Forgot your password?
typodupeerror
China Privacy

ByteDance Suspends Seedance 2 Feature That Turns Facial Photos Into Personal Voices Over Potential Risks (technode.com) 18

hackingbear writes: China's Bytedance has released Seedance 2.0, an AI video generator which handles up to four types of input at once: images, videos, audio, and text. Users can combine up to nine images, three videos, and three audio files, up to a total of twelve files. Generated videos run between 4 and 15 [or 60] seconds long and automatically come with sound effects or music.

Its performance is unfortunately so good that it has forced the firm to block its facial-to-voice feature after the model reportedly demonstrated the ability to generate highly accurate personal voice characteristics using only facial images, even without user authorization.

In a recent test, Pan Tianhong, founder of tech media outlet MediaStorm, discovered that uploading a personal facial photo caused the model to produce audio nearly identical to his real voice -- without using any voice samples or authorized data. [...]

This discussion has been archived. No new comments can be posted.

ByteDance Suspends Seedance 2 Feature That Turns Facial Photos Into Personal Voices Over Potential Risks

Comments Filter:
  • by gurps_npc ( 621217 ) on Tuesday February 10, 2026 @11:25PM (#65981554) Homepage

    This is practically a stereotypical AI use - look for associations in a massive database, inducing a formula from that data and then reversing the process to deduce a conclusion based on new data.

    It is rather obvious that bone structure should both affect one's voice and also be observable via a picture, but at the same time involve such massive calculations that humans would be surprised by it.

  • by liqu1d ( 4349325 ) on Wednesday February 11, 2026 @12:52AM (#65981624)
    I'm finding it a tad hard to believe an AI can guess someone's voice correctly from a photograph.
    • by mattr ( 78516 ) on Wednesday February 11, 2026 @03:01AM (#65981690) Homepage Journal

      Not an expert in this area. But apparently it is a thing. Funnily enough the feature they are worried about is actually a security attack... ha ha. Welp, this cat is out of the bag unfortunately, so now just the criminals will have it.
      1. Foice - Generate voice based on an image as an attack on voiceprint systems
      https://www.usenix.org/system/... [usenix.org]
      2. Speech2Face - the reverse process. https://speech2face.github.io/ [github.io]
      3. Predict physical attributes from voice with ML
      https://www.researchgate.net/p... [researchgate.net]

    • I'm finding it a tad hard to believe an AI can guess someone's voice correctly from a photograph.

      Then you're going to be gobsmacked by how they can reconstruct a person's physical appearance from just their skull bones.

      Back to the topic at hand, it's not that difficult to theorize about reproducing the voice's tonal characteristics. A person's voice is influenced by their skull shape, jaw size, sinus cavities, muscle structure, neck length, etc. With millions of examples to match voice intonation to physical appearance, they can make a reasonable approximation of a person's voice tonal sound.

      What the

    • by AmiMoJo ( 196126 )

      It says in the summary that you can upload audio as well. Bytedance actually demonstrated the audio bit years ago. They had about 10 seconds of speech, and from that it could extrapolate a reasonably believable artificial voice. Scammers have been using the tech for years.

    • by ac22 ( 7754550 )

      Yes, presumably the AI gets confused when you start adding different countries into the equation. If you have identical twins growing up in different places - Scotland and Australia, say - how is the AI going to account for the large differences in accents?

    • Feed something enough data, and as another commenter on here put it: stereotypes emerge. If it looks like a duck, it probably quacks like a duck
  • Black guy: Ving Rhames voice, or else Flava Flave
    Guy with big nose: thick Yiddish accent
    Asian: gong sounds in background
    Weak chin: "I SAY, old chap!"
    Woman: whiny and complaining, or else submissive and apologetic
    Kid with crew cut: "Gee Willikers, Mr!"
    Guy with long hair: "Dude..."
    etc. Do we really need phrenological explanations when good old stereotypes can get us 99% of the way there in explaining how amazing this feature must be?

  • The one thing in the fantastic anime movie "Roujin Z" (about an A.I. care-bed that goes haywire, takes on the persona of the elderly occupant's dead wife and tries to take him to the beach; while military A.I. robots try to stop them) that I'd considered unreasonable was when the A.I. bed scanned a picture of the guy's dead wife and honed in on an accurate voice for her to use to interact with him (all on its own). And now we actually get THAT part before a lot of the other stuff that seemed more plausible

Why do we want intelligent terminals when there are so many stupid users?

Working...