Lyrebird Claims It Can Recreate Anyone's Voice Based On Just a 1 Minute Sample (theverge.com) 107
Artem Tashkinov writes: Today, a Canadian artificial intelligence startup named Lyrebird unveiled its voice imitation deep learning algorithm that can mimic a person's voice and have it read any text with a given emotion, based on the analysis of just a few dozen seconds of audio recording. The website features samples using the recreated voices of Donald Trump, Barack Obama and Hillary Clinton. A similar technology was created by Adobe around a year ago but it requires over 20 minutes of recorded speech. The company sets to open its APIs to the public, while the computing for the task will be performed in the cloud.
Film actors, you're next.
Not yet, at least. They current sound like Robama and the Trumpinator.
I'm sure there's a way to "coach" the AI to get a better performance out of it. HTML for voice overs, if you will.
Give it ten minutes of speech and I'll bet it's a hell of a lot better (more like the real voice).
This will only get better and better, and I'd hazard a guess that before long most of us won't be able to tell the real voice from the synthetic one.
Except in a real movie, you wouldn't just take the audio stream straight from the algorithm; you'd have some kind of highly skilled specialist tweaking it to get the exact effect the director wanted.
A combination of art and science will eventually be able to produce completely convincing audio forgeries, very likely long before science alone will be able to.
Re:AI killing industry (Score:4, Interesting)
I don't think so. Stars still have legal rights over their likeness. I think you'd have a lot of trouble getting away with saying something like "Starring... a voice like Paul Rudd's, a voice like Carrie Fisher's, etc...".
Star power isn't going anywhere. There's really no logical reason that famous film stars are also billed prominently for animation, and yet that's what we have.
Spoiler alert. I don't think that Carrie is going to be complaining about it much*.
* Too soon?
I used a dead star on purpose to illustrate that even resurrecting dead star voices won't be possible without paying the estate.
Yeah... cause what works in Japan definitely will work in the rest of the world.
Artificial stars work here, too. The Archies had some hits, and Gorillaz did too (admittedly they have a real voice, but that can be switched out easily enough).
Just look at Hollywood, they're about as artificial as you can get while still being somewhat real.
There is always a reason. (Score:2)
There's really no logical reason that famous film stars are also billed prominently for animation, and yet that's what we have.
The vocal performance and personality of the actor shapes and defines the animation of the character.
Disney understood that from the beginning, which is why three generations of stars from film, radio, television and theater have recorded for Disney. Try imaging the animated Aladdin without the manic improvisation of Robin Williams.
For bonus points, try re-casting the voice of Rocket Raccoon and see if you if you still have a CGI and motion capture character that audiences will actually give a damn about,
Wouldn't that be the far less inspired version featured in the rest of the films and TV series? We don't need to imagine it we've seen it.
They're ahead of you on that one, guess who.
http://www.nme.com/news/film/w... [nme.com]
At lease it's not Samuel L Jackson. Not that I have a problem with him or he's not good or anything but he is in seriously everything these day. You can barely see a sanitary pad advert without his mug popping up.
No argument from me - but there is no logical reason that the best voice actors in the world also happen to be people who have the qualities necessary to be an on-screen star. The use of so many screen actors for voice-only roles implies existing star power as the prime motivation.
As long as they don't mention the person's name they're fine.
The Bible reading at the start of "Number Of The Beast" by Iron Maiden isn't Vincent Price; he wanted too much money.
Exactly - but would Robin Williams fans go to see Aladdin if it was just an un-advertised sound-alike? You'd probably even have the opposite reaction. It's the name and star power of Robin Williams* that drove that casting decision.
William's raw talent was evident in the movie and it's success, but I doubt it drove the casting more so than his existing star power. Even if I'm wrong about this movie, there simply aren't examples of high-grossing cartoons staring only otherwise-unknown voice actors.
You are refuting a point that I didn't make.
I'm not saying that they won't be able to use a rip off of John Wayne's voice... they can do that right now with human impressionists.
I'm saying that cartoons today use already-famous on-screen actors as big names on the marquee - not for their voices. This won't change. If they want to use a John Wayne sound-alike (computer generated or otherwise), they can go right ahead. If they want to SAY they are using a John Wayne sound-alike for marketing reasons, they'll need to pay the estate.
No one, except maybe Morgan Freeman, gets hired for their voice*
*doesn't include voice actors and mr movie phones, obviously.
...so voice-over artists are first to go then. Just train the computer with $actor's voice and then make it speak French/German/Swedish/Elbonian or whatever.
I'll miss watching dubbed stuff when on holiday. I think BA in the A-Team was my favourite dubbed voice
;-)
Voice Actors are pop idols (Score:2)
The Little Mermaid made real?
They are unionized. They'll be fine.
If programmers were half as smart as they claim to be, they'd unionize too.
Re:AI killing industry (Score:4, Insightful)
People don't realize the amount of effort people are willing to put into CGI. Same thing will happen with voices. Photorealistic actors are already here, we see them all the time but don't realize it. Just about every action movie made in the 2000's has heavy doses of CGI, often times in surprising scenes where one wouldn't expect to see it.
Hollywood bean counters will love it because it means higher profits. Cable networks will love it because they can crank out cheap product. Producers and directors will love it because they can program actors like the program CGI. Actors will love it because they can get back on the stage and forget about that movie stuff. Viewers will love it because we really just want to look at pretty pictures and are happy to suspend our beliefs if the face is pretty enough.
As soon as that passes, however, we will all be gone.
What an intelligent way to eliminate ones own species...
Soon, the AI industry will be the only industry where people still can get a job. As soon as that passes, however, we will all be gone. What an intelligent way to eliminate ones own species...
And when that goes you can get a job as a foot soldier against our robot overlords.
Maybe it's the other way around; ya never know these days. The hair may simply be a flattened orange turban.
Adobe Flash PLayer, eh? (Score:2)
Didn't know anybody still used that. Hosers!
Good (Score:1)
Perhaps now we'll need more verification and proof before information is accepted, leading to more accountability
There will probably be an initial (long) period of blind (deaf?) acceptance of what is heard, and massive amounts of media coverage and lawsuits revolving around fake shit.
CAPTCHA: dreadful
How long before estates of dead entertainers sue ? (Score:5, Interesting)
If this true I imagine Hollywood would jump on this -- they now have one less reason to be inconvenienced when an (popular) actor dies.
Someone uses a reconstruction of someone else's popular, but now dead voice, as a marketing ploy -- much like Natalie Cole hijacked her father's song -- are we going to have lawsuits over unauthorized sound-a-likes now?
I also imagine the music industry would go crazy over it as well. First with their Auto-Tune shenanigans I'm now waiting for the inevitable "Auto-Sing" -- "we can recreate the voice of any dead singer!"
Re:How long before estates of dead entertainers su (Score:5, Insightful)
This is true in the same way that auto-tune removes the need for musical singing ability. Sure, you can force a certain note, but it sounds artificial. Similarly this tool can replicate a voice at standard timbres and emotions well enough to be recognizable, but not well enough to be undetectable as a digital emulation.
It's not until it's undetectable (such as some of the best modern CGI) that we'll actually have made actors obsolete. Except... amazingly, CGI costs more than the actors, it's less flexible, and slower. I think it will be quite a while before we have something that is both on-par for quality and cheaper than a skilled live human.
But it doesn't need to. They don't have to do auto-tune in discrete steps following a set scale, it could be (as far as the human ear is concerned) done in an analog fashion.
The technology will improve until you don't even notice it. It may already have done so, with the only auto-tune you notice today being deliberately worse than necessary for effect or simply the result of cut-rate sound engineering.
Which makes me wonder... can you get a m
Also, I think people are underestimating the creative input that a performer puts into a voice performance. They can put in a lot of subtle emphasis and emotion into speech. Even if AI can perfectly replicate someone's voice, will it know when to emphasize a word, when to change the pitch of its voice, and when to insert a dramatic pause?
Re: (Score:3)
Um... needs work. A lot of work (Score:3)
So far the every sample (including titular one with Robo Donald Trump) sounds like a mangled Stephen Hawking voice-bot
:(
If I heard that voice from behind the door asking if I were John Connor, I'd say I'm a meat popsicle.
No kidding. I would almost be willing to bet money that they're simply doing what Texas Instruments were doing way back in the late 70's for their various TMS5xx0 speech-synthesizer chips. They'd analyze the spoken words, turn them into various predictive-coding data that the chips use to play back the words.
You could even do things like adjust intonation with speech-synthesizer ICs from 30-40 years ago, and it sounds for all the world like they're doing it the same way with Lyrebird - separating out the formants and pitch, and then playing them back.
Imagine when the dishonest and corrupt CIA (Score:5, Interesting)
gets their hands on this. With photorealistic CGI and manufactured voices, they can manufacture any recorded situation and evidence they want, and pass it off as real.
I think we will eventually reach a point in the world where every person of notability has a private encryption key, and any statement or appearance they make will be signed so people know what is real and what is not.
Either:
The encryption scheme you're using is flawed by design due to their moles influencing in their design, allowing them to break it rapidly, or they know of practical flaws that they did not put in there but that they have also chosen to hide from the public.
They surreptitiously steal your private key.
They have quantum computing capabilities advanced enough to run practical attacks on the encryption scheme you're using.
Very few encryption schemes are mathematically proven to be secure, and they typically have impractical requirements.
Re:Imagine when the dishonest and corrupt CIA (Score:4, Insightful)
I suspect more the reverse; it will be a convenient way to deny anything inconvenient.
1. Leader: 'X'
2. Leader: 'I never said X'
3. Opposition: 'But hundreds of people heard you say X'
4. Leader: 'Either they are my enemies, in which case they are liars, or they are my supporters, and know in their heart I didn't say X'
5. Opposition: 'We even have a video of you saying X'
6. Leader: 'And you just made that up, with your computers and things! Enemies! Off with your heads!'
There seems to be a global current these days, away from the principles of Enlightenment and Absolutism, back toward Authoritarianism and the denial of objectivity. When facts become subjective, all viewpoints are equally valid and 'truth' can be determined by vote or decree. Quite Nineteen Eighty-Four (although it predated Orwell by thousands of years).
I'm pretty sure we've already seen 1 through 5 with Trump. At this point I wouldn't be too surprised if 6 happened.
How far are we really from denial of objectivity? How many Americans are religious? How many of those believe that their religion is the one true truth?
In a post-truth world, the only way to win is to have better narratives. Tell better stories, don't worry about facts.
Hello Computer (Score:2)
I would love for a "personal digital assistant" to have Majel Barrett's voice or John Forsythe's voice. Hell, if nothing else we could continue to produce TV programs or movies where their voices are important.
I guess it's better than Festival (Score:1)
I guess it's better than Festival but it's proprietary technology while Festival is free.
Frankly I find this scary (Score:2, Insightful)
I wouldn't class myself as a technophobe but this leaves us all open to the creation of a "confession" for something we have not done. Scary shit in my opinion. And no I don't trust some law inforcement agencies or in fact some government agencies to do just that. (I'll put on my tinfoil hat)
Ok it's cool (Score:1)
Is it just me that still hears microsoft Sam under all of this. While the likeness is there it's still pretty obvious it's generated.
Great for Alexa (Score:1)
This will be great! Now ill be able to order stuff with anyones Alexa!
I love their approach (Score:3)
Haha.
Wait, what's the joke?
about a month to a clone (Score:2)
I give it about a month before there will be a decent open source clone. Progress in AI is crazy fast.
Can it do ... (Score:4, Interesting)
Are you guys actually listening to these samples? (Score:3, Informative)
While this technology does a decent job capturing some of the voice characteristics, it still sounds like a damn generated voice. Im no sound expert but its the reverb or something like that in the generated voice that makes it sound just like all other generated voice. Hell if you didn't tell me that was Obama I might not even have put 2 and 2 together - sounds like a drunk (lacking enunciation) Obama I suppose. The Hillary, barely even recognizable as her. Sorry but I cant hear past the "robot" voice attenuation, which is what plagues all generated voice.
I'm with you man. These sure do have a long way to go! Call me when there is actually something worth listening to.
Re: (Score:3)
Re: (Score:3, Insightful)
But it's NOT "really close". It's not even REMOTELY close. How the hell did this comment get modded "Insightful"?
but that they can get really close
I'm not so sure about that. Those samples, if they're the best we can manage, seem to indicate that we're a long way off from 'really close'.
it's going to happen relatively soon
In the geologic sense, I suppose.
so we should stop relying on audio recording as authentic
That's a bit premature. Synthesized voice isn't even tolerable yet; listening to it is almost painful. I don't think we'll need to worry computer generated impersonations ruining our lives for a long, long, time.
Voice Analyzer? (Score:5, Funny)
Hello, i am the system administrator. My voice is my password, verify me.
scammers rejoice (Score:4, Informative)
http://fortune.com/2017/03/28/... [fortune.com]
Kirk to Enterprise (Score:2)
Queen to knights level 3
Computer, verified
AI and cloud computing is a very dangerous combo (Score:3)
Continue with the operation (Score:1)
If it can do singing voices also.... (Score:2)
" Hey Janelle, what's wrong with Wolfie?" (Score:1)
Not that good (Score:2)
This is a fun thing, but the voices still sound very very artificial.