Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Software Technology

Lyrebird Claims It Can Recreate Anyone's Voice Based On Just a 1 Minute Sample (theverge.com) 121

Artem Tashkinov writes: Today, a Canadian artificial intelligence startup named Lyrebird unveiled its voice imitation deep learning algorithm that can mimic a person's voice and have it read any text with a given emotion, based on the analysis of just a few dozen seconds of audio recording. The website features samples using the recreated voices of Donald Trump, Barack Obama and Hillary Clinton. A similar technology was created by Adobe around a year ago but it requires over 20 minutes of recorded speech. The company sets to open its APIs to the public, while the computing for the task will be performed in the cloud.
This discussion has been archived. No new comments can be posted.

Lyrebird Claims It Can Recreate Anyone's Voice Based On Just a 1 Minute Sample

Comments Filter:
  • by Anonymous Coward on Monday April 24, 2017 @05:43PM (#54294927)
    Goodbye, voice actors.
    Film actors, you're next.
    • by Anonymous Coward on Monday April 24, 2017 @06:01PM (#54295057)

      Not yet, at least. They current sound like Robama and the Trumpinator.

      Word verification: rejector

      • Yep, and even if they didn't sound hopelessly artificial and robotic, they don't really sound *that* close to the people they're supposed to be anyway. Somewhat close, but hardly professional impersonator close.
      • by grumling ( 94709 )

        I'm sure there's a way to "coach" the AI to get a better performance out of it. HTML for voice overs, if you will.

      • Not yet, at least. They current sound like Robama and the Trumpinator.

        Give it ten minutes of speech and I'll bet it's a hell of a lot better (more like the real voice).

        This will only get better and better, and I'd hazard a guess that before long most of us won't be able to tell the real voice from the synthetic one.

      • by hey! ( 33014 )

        Except in a real movie, you wouldn't just take the audio stream straight from the algorithm; you'd have some kind of highly skilled specialist tweaking it to get the exact effect the director wanted.

        A combination of art and science will eventually be able to produce completely convincing audio forgeries, very likely long before science alone will be able to.

        • by Vastad ( 1299101 )

          I really really hope to see this become an affordable tool for game developer studios. You would be able to have the scriptwriting depth and flexibility of say the old Fallout, Planescape or Baldurs Gate games and not have to worry about getting VAs for every damn random NPC in the game or having to re-record lines.

          It doesn't necessaily mean the end of traditional voice-acting, star power and emotional reach would still be a draw for key roles, but you could finally get away from every damn Nord in Skyri

        • A combination of art and science will eventually be able to produce completely convincing audio forgeries

          "completely convincing" against what level of sceptical and detailed investigation?

          (I'll admit that in contravention of my normal habits, I haven't RTFA, or even tried to find the origina source. But since I've got two hearing aids in as I type, and have never in my life understood why people waste thier time with music, I doubt there'd be any point in listening to any sounds in the report. I often can

    • by MightyYar ( 622222 ) on Monday April 24, 2017 @06:08PM (#54295097)

      I don't think so. Stars still have legal rights over their likeness. I think you'd have a lot of trouble getting away with saying something like "Starring... a voice like Paul Rudd's, a voice like Carrie Fisher's, etc...".

      Star power isn't going anywhere. There's really no logical reason that famous film stars are also billed prominently for animation, and yet that's what we have.

      • by OzPeter ( 195038 )

        I don't think so. Stars still have legal rights over their likeness. I think you'd have a lot of trouble getting away with saying something like "Starring... ... a voice like Carrie Fisher's, etc...".

        Star power isn't going anywhere. There's really no logical reason that famous film stars are also billed prominently for animation, and yet that's what we have.

        Spoiler alert. I don't think that Carrie is going to be complaining about it much*.

        * Too soon?

        • I used a dead star on purpose to illustrate that even resurrecting dead star voices won't be possible without paying the estate.

      • You just create "artificial" stars. Hatsune Miku already works in Japan, it'll work in the rest of the world too.
        • Yeah... cause what works in Japan definitely will work in the rest of the world.

          • Artificial stars work here, too. The Archies had some hits, and Gorillaz did too (admittedly they have a real voice, but that can be switched out easily enough).
            • Artificial stars work here, too. The Archies had some hits, and Gorillaz did too (admittedly they have a real voice, but that can be switched out easily enough).

              Just look at Hollywood, they're about as artificial as you can get while still being somewhat real.

      • There's really no logical reason that famous film stars are also billed prominently for animation, and yet that's what we have.

        The vocal performance and personality of the actor shapes and defines the animation of the character.

        Disney understood that from the beginning, which is why three generations of stars from film, radio, television and theater have recorded for Disney. Try imaging the animated Aladdin without the manic improvisation of Robin Williams.

        For bonus points, try re-casting the voice of Rocket Raccoon and see if you if you still have a CGI and motion capture character that audiences will actually give a damn about,

        • Try imaging the animated Aladdin without the manic improvisation of Robin Williams.

          Wouldn't that be the far less inspired version featured in the rest of the films and TV series? We don't need to imagine it we've seen it.

        • Removing the concrete voice from "general" voice performance could do the same thing that dubbing did to voices and looks, and what stunts did for actors and action scenes: you could get extra freedom from picking the two independently.
        • Try imaging the animated Aladdin without the manic improvisation of Robin Williams.

          They're ahead of you on that one, guess who.

          http://www.nme.com/news/film/w... [nme.com]

          At lease it's not Samuel L Jackson. Not that I have a problem with him or he's not good or anything but he is in seriously everything these day. You can barely see a sanitary pad advert without his mug popping up.

        • The vocal performance and personality of the actor shapes and defines the animation of the character.

          No argument from me - but there is no logical reason that the best voice actors in the world also happen to be people who have the qualities necessary to be an on-screen star. The use of so many screen actors for voice-only roles implies existing star power as the prime motivation.

      • As long as they don't mention the person's name they're fine.

        The Bible reading at the start of "Number Of The Beast" by Iron Maiden isn't Vincent Price; he wanted too much money.

        • Exactly - but would Robin Williams fans go to see Aladdin if it was just an un-advertised sound-alike? You'd probably even have the opposite reaction. It's the name and star power of Robin Williams* that drove that casting decision.

          * William's raw talent was evident in the movie and it's success, but I doubt it drove the casting more so than his existing star power. Even if I'm wrong about this movie, there simply aren't examples of high-grossing cartoons staring only otherwise-unknown voice actors. They al

      • I don't think so. Stars still have legal rights over their likeness. I think you'd have a lot of trouble getting away with saying something like "Starring... a voice like Paul Rudd's, a voice like Carrie Fisher's, etc...".

        Star power isn't going anywhere. There's really no logical reason that famous film stars are also billed prominently for animation, and yet that's what we have.

        No one, except maybe Morgan Freeman, gets hired for their voice*

        *doesn't include voice actors and mr movie phones, obviously.

        • Some less known actors get many voice actor roles because of their distinctive voices. Example: Elias Toufexis

          • Yeah, I was thinking more big name actors like Brad Pitt or George Clooney or whoever and only in big films, except they do all the time for voice over and narration etc. In hindsight it wasn't a well thought out statement lol.
      • ...so voice-over artists are first to go then. Just train the computer with $actor's voice and then make it speak French/German/Swedish/Elbonian or whatever.

        I'll miss watching dubbed stuff when on holiday. I think BA in the A-Team was my favourite dubbed voice ;-)

      • Star power? No. But if a star can license their likeness to these film companies without doing anything at all, thats a different story. Every voiceover can now star the exact same people that fit the bill of being a known and also willing to license their likeness for the least amount. Good luck breaking into this sort of market, pretty much forever.
        • That's true - if the technology ever gets to the point where it is cheaper and as effective to use than having a person speak into a microphone. I'm not really concerned with how easy or hard it is to "break in" to Hollywood - it's already insanely hard. I'd suggest doing something more useful with your life, but now I sound like an asshole. Hopefully this asshole just saved someone from a barista job.

      • Who really cares if it's only 90% similar to actual actor's voices; if it's close enough or just sounds good/appropriate for the video content, that's enough for most people. Would I be so put-off by not having a specific actor's voice that I wouldn't watch a movie? Can't see that happening, except for some die-hard supporters or fan-boys of specific actors.
        • I'm not making that argument. Let's say the technology is perfect and produces a result exactly like the actor's. It still won't matter, because people want to go see a movie starring _insert_celebrity_, not some robot. To use the celebrity's name in any promotional material, you'll need to pay the celebrity (or the celebrity's estate). So this is no threat to celebrities, because they will get paid ether to do the actual voice work, or to have their name associated with the film.

    • They're not going anywhere. The point is that they're 'real' people. I suppose it might cost second stringers their jobs, but then who'll rise through the ranks? It takes time to build star power.
    • would be a nice future, would bring the price of making a movie way way down and perhaps make the studios more adventurous with what they are willing to try.
    • by Anonymous Coward

      They are unionized. They'll be fine.

      If programmers were half as smart as they claim to be, they'd unionize too.

    • by grumling ( 94709 ) on Monday April 24, 2017 @08:24PM (#54295687) Homepage

      People don't realize the amount of effort people are willing to put into CGI. Same thing will happen with voices. Photorealistic actors are already here, we see them all the time but don't realize it. Just about every action movie made in the 2000's has heavy doses of CGI, often times in surprising scenes where one wouldn't expect to see it.

      Hollywood bean counters will love it because it means higher profits. Cable networks will love it because they can crank out cheap product. Producers and directors will love it because they can program actors like the program CGI. Actors will love it because they can get back on the stage and forget about that movie stuff. Viewers will love it because we really just want to look at pretty pictures and are happy to suspend our beliefs if the face is pretty enough.

    • Soon, the AI industry will be the only industry where people still can get a job.
      As soon as that passes, however, we will all be gone.
      What an intelligent way to eliminate ones own species...
      • Soon, the AI industry will be the only industry where people still can get a job. As soon as that passes, however, we will all be gone. What an intelligent way to eliminate ones own species...

        And when that goes you can get a job as a foot soldier against our robot overlords.

        • Not a job, you'll be a volunteer, a partizan, a freedom fighter, a guerilla. But of course you'll always be called a terrorist.
    • Look on the bright side: All new episodes of Gilligan's Island and Hogan's Heroes.
    • Good bye singing in the Human species. Once they _get all real naturally born singers and voices, they can place any yeller (aka African) and pretend it is Pavarotti himself then forget about the issue eventually, as it is well known Asia does not sing save very few bass voices...
  • Didn't know anybody still used that. Hosers!

  • by Anonymous Coward

    Perhaps now we'll need more verification and proof before information is accepted, leading to more accountability

    • by Anonymous Coward

      There will probably be an initial (long) period of blind (deaf?) acceptance of what is heard, and massive amounts of media coverage and lawsuits revolving around fake shit.

      CAPTCHA: dreadful

  • by UnknownSoldier ( 67820 ) on Monday April 24, 2017 @05:49PM (#54294977)

    If this true I imagine Hollywood would jump on this -- they now have one less reason to be inconvenienced when an (popular) actor dies.

    Someone uses a reconstruction of someone else's popular, but now dead voice, as a marketing ploy -- much like Natalie Cole hijacked her father's song -- are we going to have lawsuits over unauthorized sound-a-likes now?

    I also imagine the music industry would go crazy over it as well. First with their Auto-Tune shenanigans I'm now waiting for the inevitable "Auto-Sing" -- "we can recreate the voice of any dead singer!"

    • by The Raven ( 30575 ) on Monday April 24, 2017 @06:13PM (#54295115) Homepage

      This is true in the same way that auto-tune removes the need for musical singing ability. Sure, you can force a certain note, but it sounds artificial. Similarly this tool can replicate a voice at standard timbres and emotions well enough to be recognizable, but not well enough to be undetectable as a digital emulation.

      It's not until it's undetectable (such as some of the best modern CGI) that we'll actually have made actors obsolete. Except... amazingly, CGI costs more than the actors, it's less flexible, and slower. I think it will be quite a while before we have something that is both on-par for quality and cheaper than a skilled live human.

      • >Sure, you can force a certain note, but it sounds artificial.

        But it doesn't need to. They don't have to do auto-tune in discrete steps following a set scale, it could be (as far as the human ear is concerned) done in an analog fashion.

        The technology will improve until you don't even notice it. It may already have done so, with the only auto-tune you notice today being deliberately worse than necessary for effect or simply the result of cut-rate sound engineering.

        Which makes me wonder... can you get a m

      • Also, I think people are underestimating the creative input that a performer puts into a voice performance. They can put in a lot of subtle emphasis and emotion into speech. Even if AI can perfectly replicate someone's voice, will it know when to emphasize a word, when to change the pitch of its voice, and when to insert a dramatic pause?

    • It's already happened [youtube.com]. Here's another one [youtube.com].
    • by kiviQr ( 3443687 )
      No, estates will abuse it till they can get all the money there is. Expect actors that would never lower themselves to certain level be featured in ads - b/c family gets an extra buck!
  • by saikou ( 211301 ) on Monday April 24, 2017 @05:49PM (#54294979) Homepage

    So far the every sample (including titular one with Robo Donald Trump) sounds like a mangled Stephen Hawking voice-bot :(
    If I heard that voice from behind the door asking if I were John Connor, I'd say I'm a meat popsicle.

    • by Anonymous Coward

      No kidding. I would almost be willing to bet money that they're simply doing what Texas Instruments were doing way back in the late 70's for their various TMS5xx0 speech-synthesizer chips. They'd analyze the spoken words, turn them into various predictive-coding data that the chips use to play back the words.

      You could even do things like adjust intonation with speech-synthesizer ICs from 30-40 years ago, and it sounds for all the world like they're doing it the same way with Lyrebird - separating out the fo

  • by Anonymous Coward on Monday April 24, 2017 @05:51PM (#54294999)

    gets their hands on this. With photorealistic CGI and manufactured voices, they can manufacture any recorded situation and evidence they want, and pass it off as real.

    I think we will eventually reach a point in the world where every person of notability has a private encryption key, and any statement or appearance they make will be signed so people know what is real and what is not.

    • Either:

      The encryption scheme you're using is flawed by design due to their moles influencing in their design, allowing them to break it rapidly, or they know of practical flaws that they did not put in there but that they have also chosen to hide from the public.

      They surreptitiously steal your private key.

      They have quantum computing capabilities advanced enough to run practical attacks on the encryption scheme you're using.

      Very few encryption schemes are mathematically proven to be secure, and they typicall

    • by pushing-robot ( 1037830 ) on Monday April 24, 2017 @07:25PM (#54295423)

      I suspect more the reverse; it will be a convenient way to deny anything inconvenient.

      1. Leader: 'X'
      2. Leader: 'I never said X'
      3. Opposition: 'But hundreds of people heard you say X'
      4. Leader: 'Either they are my enemies, in which case they are liars, or they are my supporters, and know in their heart I didn't say X'
      5. Opposition: 'We even have a video of you saying X'
      6. Leader: 'And you just made that up, with your computers and things! Enemies! Off with your heads!'

      There seems to be a global current these days, away from the principles of Enlightenment and Absolutism, back toward Authoritarianism and the denial of objectivity. When facts become subjective, all viewpoints are equally valid and 'truth' can be determined by vote or decree. Quite Nineteen Eighty-Four (although it predated Orwell by thousands of years).

      • I'm pretty sure we've already seen 1 through 5 with Trump. At this point I wouldn't be too surprised if 6 happened.

        How far are we really from denial of objectivity? How many Americans are religious? How many of those believe that their religion is the one true truth?

      • by AmiMoJo ( 196126 )

        In a post-truth world, the only way to win is to have better narratives. Tell better stories, don't worry about facts.

  • I would love for a "personal digital assistant" to have Majel Barrett's voice or John Forsythe's voice. Hell, if nothing else we could continue to produce TV programs or movies where their voices are important.

  • I guess it's better than Festival but it's proprietary technology while Festival is free.

  • by Anonymous Coward

    I wouldn't class myself as a technophobe but this leaves us all open to the creation of a "confession" for something we have not done. Scary shit in my opinion. And no I don't trust some law inforcement agencies or in fact some government agencies to do just that. (I'll put on my tinfoil hat)

  • by Anonymous Coward

    Is it just me that still hears microsoft Sam under all of this. While the likeness is there it's still pretty obvious it's generated.

  • by Anonymous Coward

    This will be great! Now ill be able to order stuff with anyones Alexa!

  • by davecb ( 6526 ) <davecb@spamcop.net> on Monday April 24, 2017 @06:26PM (#54295171) Homepage Journal
    The folks at University of Montréal aren't to be sneezed at. https://lyrebird.ai/ethics [lyrebird.ai] makes a nice bilingual joke.
  • I give it about a month before there will be a decent open source clone. Progress in AI is crazy fast.

  • Can it do ... (Score:4, Interesting)

    by PPH ( 736903 ) on Monday April 24, 2017 @06:47PM (#54295245)

    ... Fran Drescher?

  • by Anonymous Coward on Monday April 24, 2017 @06:56PM (#54295295)

    While this technology does a decent job capturing some of the voice characteristics, it still sounds like a damn generated voice. Im no sound expert but its the reverb or something like that in the generated voice that makes it sound just like all other generated voice. Hell if you didn't tell me that was Obama I might not even have put 2 and 2 together - sounds like a drunk (lacking enunciation) Obama I suppose. The Hillary, barely even recognizable as her. Sorry but I cant hear past the "robot" voice attenuation, which is what plagues all generated voice.

    • by Anonymous Coward

      I'm with you man. These sure do have a long way to go! Call me when there is actually something worth listening to.

    • Yeah, it's impressive for what it is, but they don't sound human. The Trump voice was the closest, but then Trump doesn't sound like any other human I've ever heard.
    • Re: (Score:3, Insightful)

      The point of this isn't that they can recreate 100% believable audio yet, but that they can get really close, and that it's going to happen relatively soon, so we should stop relying on audio recording as authentic.
      • by Anonymous Coward

        But it's NOT "really close". It's not even REMOTELY close. How the hell did this comment get modded "Insightful"?

      • by narcc ( 412956 )

        but that they can get really close

        I'm not so sure about that. Those samples, if they're the best we can manage, seem to indicate that we're a long way off from 'really close'.

        it's going to happen relatively soon

        In the geologic sense, I suppose.

        so we should stop relying on audio recording as authentic

        That's a bit premature. Synthesized voice isn't even tolerable yet; listening to it is almost painful. I don't think we'll need to worry computer generated impersonations ruining our lives for a long, long, time.

  • by Z80a ( 971949 ) on Monday April 24, 2017 @07:52PM (#54295549)

    Hello, i am the system administrator. My voice is my password, verify me.

  • scammers rejoice (Score:4, Informative)

    by liquid_schwartz ( 530085 ) on Monday April 24, 2017 @11:23PM (#54296229)
    Now you don't even have to trick your victim into saying yes, you can just keep them talking for a minute. If you're unfamiliar with the scam, here's a description:

    http://fortune.com/2017/03/28/... [fortune.com]

  • Queen to knights level 3
    Computer, verified

  • The true goal of AI is to destroy encryption while digitally fingerprinting all of us for those that use SSL and VPN, or whatever comes next. If an AI can recreate your voice, than it can definitely know who is typing what on the Internet. Uploading biometric data to social networks isn't helping much either. Cloud computing was designed from open source software at the start to make better use of mobile devices. But now, it is currently utilized by corporations to destroy the freedoms of the desktop, the privacy of software users, and removing control. This does not set well with most Linux people and the irony is that most cloud servers are running Linux. This allows companies to "love" open source and actually mean it, but it's really a kick in the nuts for anyone that loves FOSS and a huge financial advantage for not paying for licenses, ergo using server-based open source to destroy its desktop competition. I can get access to your API? O'lordy sir. Thankya fors ya scraps. Fuck API's. Cloud computing is just an excuse to get people who will buy mobile devices but not new laptops stuck into something they have to pay for and no control over. They could try to standardize a new architecture like they did in the late 2000s to get people to buy tech, but the cloud way is cheaper and they make more money and save more by not having the demand to improve hardware. I saw a new laptop the other day for $400 and it only sports 1.2Gz and 4GB of RAM. WTF is this shit? Y'all need to wake up because the millennial "It's 1984, oh well" syndrome is going to put us into something we average consumers can't get out of.
  • You may fire when ready.
  • ...then Queen can start touring again!
  • "I can here him barking." Seems like that's not too far away. Should make some interesting robocalls!
  • This is a fun thing, but the voices still sound very very artificial.

"The vast majority of successful major crimes against property are perpetrated by individuals abusing positions of trust." -- Lawrence Dalzell

Working...