Digital Mouths, Synthetic Faces at MIT and Lucasfilm 150
jfengel writes "Two separate articles about generating faces automatically. From the Boston Globe, there is a story about MIT scientists putting words into somebody's mouth by splicing together footage. In the samples, I couldn't tell the difference between the synthetic footage and the same person really saying the same thing. (Though it's a little hard to tell at only 81kbps video). And Wired as a lengthy article about generating purely synthetic faces at Lucasfilm. It discusses some of the difficulties in getting it right."
In related news... (Score:3, Funny)
Re:In related news... (Score:3, Funny)
"L-l-l-i-f-f-fe is l-l-like a box of ch-ch-ch-ocolates. You n-n-never kn-kn-know what you're gonna get-get-get."
FF? (Score:1)
Re:FF? (Score:2)
Re:FF? (Score:3, Informative)
Huh?! I work as a Sr. VFX guy, and CGI (Computer Generated Imaging) for facial animation is one of the most complex things to do!
Basically, there are so many muscles in the face and so many nuances that it is very difficult to emulate a realistic face. Chris Landreth [imdb.com] is a director at Alias|wavefront with whom I had the "pleasure" of working with. His entire focus has mainly been with facial animation. And even with his talent, facial animation still doesn't look 100% realistic.
Check out the book: Computer Facial Animation to get a glimpse at the mathematics, anatomy, and other technical hurdles being overcome in this arena.
Re:FF? (Score:2)
Big deal (Score:1)
Photoshop? (Score:2)
scary, scary, scary (Score:1, Troll)
Re:scary, scary, scary (Score:2)
One futuristic countermeasure I can imagine would be for concerned citizens (e.g., politicians, dissidents) to have some type of device that cryptographically signs some aspect of their speech along with a trusted indicator of time. This thing would have to transmit a signal that would be embedded in any recorded media. Thus, verification of the digital signature of the audio and time hash would indicate whether the original recording was fabricated or tampered with. Doesn't really get around this technology (i.e., they only make it look like you're mouthing the audio, they don't deal with the audio), but it would prevent others from splicing or generating fake audio to accompany these phony video clips...
Of course, there's only like, 50 zillion reasons this would be difficult/impossible to implement. But hey, I'm just the idea man...
Of course, if they could extend this work beyond the lips and face, imagine what the porn industry could do...
Re:scary, scary, scary (Score:2)
Re:scary, scary, scary (Score:2)
Re:scary, scary, scary (Score:1)
First client (Score:1, Funny)
I've been waiting for this. (Score:2)
The biggest hurdle I can see isn't technological, it'll be legal. Who really owns the rights to use the films made by famous people? It might be interesting to see just which ??AA lays claim to it first.
Re:I've been waiting for this. (Score:2, Interesting)
I definately have NOT been waiting for this. Have we lost all originality that we now must use dead actors to do our acting? Can't we find enough new actors and stars that we don't need to continue to cash in on the star power of old? Wouldn't a booming industry generating new movies with old stars say something about how our society values image over content. How the illusory is slowing replacing the real untill we no longer understand the difference and don't know why we should even care.
If I ever become famous I am going to try damn hard to make sure I don't end up selling baby diapers from the great beyond.
Re:I've been waiting for this. (Score:2)
Re:I've been waiting for this. (Score:1)
People here are just as likley to say "Hey, you going to see that new (fill in the blank with popular actor) movie", as to actually say the movies name or premise.
I think it's the same attitude of familiarity over quality in the general public that's kept microsoft on top for so long.
Re:I've been waiting for this. (Score:2)
I don't want stars I want talent. The cult of persona is detrimental to our society. It breeds inequity and helps to drive oppression. I'm fed up of the most newsworthy aspect of a film premiere being what dress somebody was wearing.
shame you lost a point, because I think you've got one
Re:I've been waiting for this. (Score:1)
MIT Was beat to the punch (Score:2)
Damn Canadians and their flapping heads... and Saddam Hussein, too!
Re: LOTR with your choice of actors.... (Score:5, Interesting)
Let me begin by once again repeating the truism: no video whatsoever can match the scenes as they appear to your imagination during a simple, unaided reading of the three volumes of Tolkien's original text.
With that out of the way, I will say that my own favorite among the video versions is the recent blockbuster edition, followed by the "Midlands" OSc 2072 dist (tuned 2,-1,4,0); and after that, the 2001-2003 movies using the Gibson/Taylor overlay. This review concentrates on videos; I will leave VRs for another day.
There is no need, at this remove, to cite the failings of the Bakshi anime (1978) or Jackson's groundbreaking 2001-2003 live action movie.... However, when WWM re-released the "long" version on tab with a selection of overlays, including Mercer/Tran/Lopez and Gibson/Taylor, the movie was transformed from a mere classic to a paradigm of style. Its effect on a generation resembled the effect of the original books on the "Sixties Era" (roughly 1964-1972). The wildly popular M/T/L overlay, its unearthly beauty toning down the somewhat brutal original video, went straight to the heart of the virals.
At the same time, the first underground OSc version, "OS-LOTR", was in process. Remember that this was before the Hurst case and copyright law was still in the postmillennial phase. Nevertheless, thousands of people participated. By any standard, the first version was pretty primitive. The base disappeared during Hurst. Only 18 snaps survive;
The first legal OSc version ("OurRing") is also available at universities, but is not worth the casual viewer's time. The maintainers provided no guidance. Story elements of an unsavory nature, having nothing to do with the original books, found their way into the base. Tuning was in its infancy: OurRing provides only five settings in each of three dimensions. The project became overlarge, and never gained popularity outside a hobbyist community. It is of historical interest only, as is the short-lived "Bakshi", based on the anime, begun and closed within a year after OurRing.
"Midlands", on the other hand, became a classic within weeks of startup. It derives most of its visual imagery and pacing from the centennial remake, but retains none of the bizarrer elements. A comparison of snaps is extremely revealing. The earliest still archived (two days in) is almost an exact copy of LOTR-100. In one week more, participation skyrocketed by 6000 percent, and the nine-day snap contains none at all of the odd politico-academic coloration. Note the gradients in this
Midlands is far more tunable than OurRing. The original tuner, which is part of the OSc v. 5.4 kernel, allowed for 15 dimensions. Addicts and purists apply the 500-dimension Gordon tuner. I have viewed several allegedly "perfectly" Gordon-tuned versions and could see no difference at all. These decimal-place variations invisible to anyone else fuel quite vitriolic disputes in the hobbyist community.
"Zealand" and "Hildebrandt", Midlands' two nearest competitors, have a much smaller following. Zealand is of course based on the 2003 video. Hildebrandt is experimental; it combines OSc and overlay technologies. There is no dist--as the maintainer states in true twentieth-century fashion, it is intended to be a "work in progress", to be "as dynamic as the events it portrays". This can lead to surprises if you view over a period of days instead of capturing the whole thing at once. Its consos also tend to be outside the standard demo.
Last year's remake is, in my opinion, the best of all. Yes, it condenses the story, but this is not a bad thing, as anyone will agree who has played one of the realtime VRs. Stern's directorial imagination could not possibly be closer to Tolkien's original vision. There is, of course, no truth to the rumor that he is a clone of Tolkien made for the purpose.
Re: LOTR with your choice of actors.... (Score:1)
Re: OSc LOTR explained (Score:2)
Re: LOTR with your choice of actors.... (Score:2, Informative)
Re: LOTR with your choice of actors.... (Score:2)
For Martyn S., here's the key--
- Overlays: Computer-generated actors, or sets of actors, replacing the originals.
- Tuners: Some kind of technology that allows you to set the amount of romance, scenery, violence, history, magic, humor, or other features (up to 500 with the Gordon tuner software) to your personal preference. Sort of like adjusting brightness/contrast/colors in an image file, on a conceptual level.
- OSc is "open source creativity." It means that a lot of people modify the "base" video, under control of maintainers. These people are called consensualists or consos.
- Snaps = snapshots of the what the video looks like at one point in time, because with OSc it's changing all the time.
- Virals = nickname for a generation, like "flappers" or "hippies" is to us.
In my experiance... (Score:2, Interesting)
Re:In my experiance... (Score:2)
great news! (Score:3, Funny)
Re:great news! (Score:1, Funny)
Re:great news! (Score:2)
She's an idoru sans the redeeming qualities.
Re:great news! (Score:1)
Re:great news! (Score:3, Funny)
Re:great news! (Score:2)
Use and Abuse (Score:2)
This about somes it up for me....
Although imagining Ted Kopel speaking in spanish is a riot.
I remember being in europe some place, listening to the BBC for ten minutes on a shortwave radio, desperately trying to understand what the guy was saying through all of the static. It then occurred to me that the announcer was speaking in spanish in a really thick and proper british accent. The accent was so strong it threw me off, between the static, and everything.
So I wonder if Koppel would even be understanding.
Damn the ethics- full speed ahead! (Score:5, Interesting)
Re:Damn the ethics- full speed ahead! (Score:3, Funny)
they may as well be spared having to turn up in person to read today's lies out.
Re:Damn the ethics- full speed ahead! (Score:2)
Re:Damn the ethics- full speed ahead! (Score:1)
Expert witnesses will be called to support and challenge video evidence.
The defence will attempt to draw reasonable doubt in the judge/jury.
So in essence, nothing will change.
Re:Damn the ethics- full speed ahead! (Score:2)
The problem isn't creating a realistic human animation - the problem is duplicating the realistic movements of an existing person. You couldn't just replace Dan Rather, for instance, since anyone who watches him infrequently would notice those little things that can't quite be animated yet. Even slicing together real video of the person is noticably different.
-Adam
George W. Bush (Score:3, Funny)
Fakin' the vids (Score:1)
What's wrong with this guy? (Score:1)
(;->)
;)
Re:What's wrong with this guy? (Score:1)
subsurface scattering and the bssrdf (Score:4, Informative)
henrik wann jensen is developing some of the most usable algorithms for skin and other translucent materials. He gave a talk last month at Cal as a prospective faculty member. It was fairly impressive.
his home page [stanford.edu]
rendering skin [stanford.edu]
rendering smoke [stanford.edu]
this has been around for a long time (Score:2, Funny)
"Hello Smithers, You're Quite Good At Turning Me On."
Signing (Score:2, Interesting)
Not sure whether the President's speech is real or fake? Just see if he signed the authorised transmissions with his PGP key.
Rendering of surface is also critical (Score:3, Informative)
Does this mean... (Score:3, Interesting)
Re:Does this mean... (Score:1)
Re:Does this mean... (Score:1)
Not that hard to tell (Score:4, Interesting)
Look for enunciation of certain latters such as P and M, and you should be able to tell the difference. The generated image gives a sense of moving the mouth but not enunciating the words clearly. Almost as if she is gliding over the words. With the real movie, however, you can see the woman completely changing her mouth formation to form the sounds required to pronounce the words.
Re:Not that hard to tell (Score:2)
For example, instead of the same person saying the same thing, two different people saying two different things would be very hard to tell which person was faking. This would probably get harder if one had an accent.
Re:Not that hard to tell (Score:1)
Re:Not that hard to tell (Score:1)
Not too hard to spot if you're looking for it, even without comparing the real and fake shots. However, give the technology another, ooo, 6 months, they'll get it to be unnoticable.
Uses in classic sci fi literature & entertainm (Score:5, Interesting)
Another, more benign use of the tech could be in entertainment. There was that episode of Star Trek: Deep Space Nine where they integrated the actors in with footage from the classic ep, Trouble With Tribbles. Great fun, but they were limited to using footage that exisited from the original series for intereacting with Kirk, Spock et al. Imagine being able to track Shatner's 60's face onto an actor and use this tech to lipsync 21st century Shatner's dialog. Best. Time Travel. Episode. Ever.
And I don't even like Trek that much :-)
Re:Uses in classic sci fi literature & enterta (Score:2)
It was kind of a freaky affect.
Re:Uses in classic sci fi literature & enterta (Score:2)
Seem almost prescient considering what happened in Florida in 2000 :-)
Sheesh, when will you democrats stop whining? It's like losing on penalties. If you can't score one more goal in over two hours of football, then you really can't complain about losing on penalites (even if it was a duff decision). BIG :-)
Re:Uses in classic sci fi literature & enterta (Score:2)
You're assuming I'm an American - hell, I'm not even in the northern hemisphere... :-)
And considering the state of Australia's government, I really shouldn't be making fun of yours :-)
Re:Uses in classic sci fi literature & enterta (Score:1)
I don't really follow Trek that much, but I *loved* that episode (having for some reason also seen the original Trouble with Tribbles).
I think that only being able to use the original footage was most of the fun: they had to think of clever ways to integrate the dialog, and movements, etc... Just think, if they'd have had free reign, it wouldn't have been nearly as good!
Re:Uses in classic sci fi literature & enterta (Score:2)
I thought the synthetic woman's delivery was very Gorelike, i.e. too wooden and perfect to be human.
just think... (Score:1)
The Vatican will be relieved... (Score:1)
seeing double (Score:1)
I can see it now ... (Score:2, Funny)
Remember this? [disk-o.com]
is there any new technology (Score:1)
I'm sure personalized videos are just around the corner.
-Ted
that explains Dick Clark! (Score:2)
We want to see Hugo! (Score:1)
"Of all the things I witness during my reporting, the one that most shakes my faith in the Cusan impossibility of fabricating synthetic souls ex nihilo is Hugo, an 18-second short created by the guys at ILM a few years back.
"Hugo is an entirely synthetic creation - a phantasm of light and algorithm. A wrinkled figure with Spockian ears, heightened cheekbones, and a sunken chin, he gazes off to the side of the camera, stammering, "Me? What do you mean I'm not real? Oh, I see. This is a joke, right? You must be talking about the other one." He then gulps nervously and gives a forced smile."
I HAVE to see that. We want Hugo ILM!
Re:We want to see Hugo! (Score:1)
video evidence inadmissible in court? (Score:2, Interesting)
Afterall, with sufficient CPU power, anybody could make anybody talk about anything!
This will also mean that the court system will then ask for eyewitnesses since videos will not be admissible.
I'm not sure whether this is good or bad.
Re:video evidence inadmissible in court? (Score:2)
This probably should have been done years ago.I suspect that one can get comparable results manually NOW using frame-by-frame editing if enough time is put in on it by someone of the requisite skill. I believe the technology to handle the audio side of making people say things that would greatly surprise them was discussed here a while ago.
Anyone who has access to the MIT setup who would like to speed this process up is invited to make a commercial of the Supremes endorsing goatse.cx as a wholesome place for children to go and get it onto the Net.
Re:video evidence inadmissible in court? (Score:1)
Who did you say would swear that the video was not manipulated? The Eyewitness? I am not too familiar with legal terms :)
Re:video evidence inadmissible in court? (Score:2)
Conan O'Brien (Score:1)
Re:Conan O'Brien (Score:2, Insightful)
You know, like the man who's not afraid of 3 inch bees.
Could be used in psychological warfare (Score:2)
Imagine Osama broadcasting on Afghani telivision to his troops to surrender to the nearest US platoon. I'm probably overestimating the stupidity of your average afghani al quaida member but chances are, you might get a good number of them to actually buy it, and surrender.
Going even further, we could fake Osama's capture, have him broadcast to the country that america is a nice place and to quit being player haters. Yeah I know this all sounds far fetched but i'm sure the military would already be looking into this.
Re:Could be used in psychological warfare (Score:2)
Actually, when the infamous video of Osama taking credit for the 9/11 attacks was publicized, many in the Islamic world insisted that the US had faked the video to frame their man Osama, as a retroactive excuse to attack Afghanistan. (Kind of like many Americans, especially Afro-Americans, who still think OJ is innocent. If there's a history of your people being scapegoated by The Man, you'll be really reluctant to trust The Man even when he's right.) I can only see this mistrust getting worse if it becomes possible to really effectively fake a video like this.
Court evidence (Score:1)
Not to worry (Score:3, Insightful)
1. a few minutes of footage of you saying stuff that has the full range of mouth movements directly into a camera.
2. an audio recording of you actually saying what it is that they want you to say. It's possible to cut and splice seperate recordings together, but 99% of the time, differences in the sound space would make it obvious that the recording was spliced together.
And then after that, all they'd have is a video of you saying the thing and staring like a zombie into the camera.
It's cool in theory, but I think Hollywood has done a lot better job at achieving better results.
Mmm, Gummi Venus De Milo...
Re:Not to worry (Score:1)
2. an audio recording of you actually saying what it is that they want you to say.
Synthesis of voice from some minimal sample (lexicon of syllables? Small enough to be compiled from public figures I'm sure) will be a reality.
I'm sure the RIAA are modifying their contracts as we speak as to lay claim to all their artists vocal patterns, original or not.
You thought Best Of albums were bad? Just wait till we have Frank Sinatra singing la'hits of Britney Spears.
Re:Not to worry (Score:2)
Re:Not to worry (Score:2)
Re:Not to worry (Score:2)
Re:Not to worry (Score:2)
Scary (Score:1)
Then, we couldn't believe everything we read...
Now, we can't believe everything we see...
I can't help but wonder what potential uses this could have. "Tonight at nine...Bill Gates admits Linux is superior to what he now refers to as 'Windoze'"
I saw the difference :-) (Score:2)
What do the synthetic pictures have in common? Well, in both cases the woman moves her lips a bit less (the second) or does slightly less facial expressions (the first one).
With this movie at low-quality post stamp size, I have my doubts regarding a full size TV newsreader. But I guess the technology is still in prototype stages and in a few years, we'll likely have synthetic newsreader indin.. indisti... indistinguishable from the real thing. But still probably far away from the same synthetic person actually performing some action more than talking.
does anyone agree with the opinions of the wired a (Score:2, Insightful)
When I first read this I thought he was joking. As I read further, I realized he was dead serious. Does anyone else find this highly ridiculous? I'm not suggesting that the concept of people having souls is ridiculous; I just think the idea of the presence or absence of one giving away a computer rendering is absurd.
For anyone who feels the same way as the wired author, I propose the following hypothetical question: If some rendering was constructed (that is, produced algorithmically with the help of an artist) that was a truly perfect copy of a view of an actual person (i.e., every photon given off by either was matched), would a viewer be able to visually distinguish the two?
If someone answers "Yes", then this becomes a matter of belief in supernatural powers and will not benefit from further discussion.
If someone answers "No, but any rendering that could actually be created would be distinguishable from the human", I would give the following argument.
First of all, I don't think the rendering of the actual surfaces involved is a point of contention. If believable "bellies or thighs" can be done, then we can adequately render the surfaces of the face as well. The issue is positioning those surfaces to create a convincingly human expression. What if the artists were to take photographs of the actual person and use points of reference on the person's face as control points to position the artificial model? (Of course, they already do this.) As more control points are used the model will become increasingly like the original. The wired author essentially addresses this very point with his analogy of approximating a circle using many-sided polygons:
This concept falls apart when you consider the content of the final phrase (in parentheses (heh-self describing)). While a face can be considered continuous, human vision is just as discreet as computer graphics. We have a finite number of rods and cones in our retina. The number of possible responses of those rods and cones to different intensities of light may be harder to quantify, but it is certainly true that given two light sources of increasingly similar brightness there exists a point at which they will be humanly indistinguishable. A rendering does not have to be actually perfect to be perfect as far as human vision is concerned.
Anyway, my point is that the problem of creating believable computer representations of humans is a matter of engineering. It certainly is a very difficult problem, but I don't think you can reasonably claim it is insurmountable due to a computer's lack of a soul unless your argument is based on something like telepathy.
The Soul Behind the Face (Score:2, Insightful)
All the pieces are in place: their economy is terrible, they take cartoons seriously, and they envy Americans.
A holy grail of Japanese animation is to look and sound exactly like an American live action movie. They could save their economy by replacing Hollywood actors with Tokyo animators. They could make movies their next great export, after cars and electronics. I think Americans won't lead the synthespian wave: We love our actors too much and we have little to gain. The Japanese don't love American actors (economically) and they have everything to gain.
Final Fantasy's failure to profit has scared them, but they're already improving. They're learning how to write and act like Americans from Americans. That's what Square has done with Kingdom Hearts, translated by Disney and starring Haley Joel Osment. And the Metal Gear games, made in Japan and voice acted in USA, also sell well in USA.
So I think the Japanese will do it. They need to.
Re:The Soul Behind the Face (Score:1)
I look forward to the possibilities (Score:2)
When we consider Final Fantasy: The Movie, and contrast it to what should be viable within just 5 years from now, it boggles the mind.
I, for one, would love to see a digital-quality old western film - but with both the Duke and Eastwood, not just one. Oh, and while we're at it, why not have Arnold Swartsenager (spelled wrong, I'm sure) be a henchman. And hell, throw "Han Solo" (Harrison Ford) in there as a local traveling trader, but in some western chaps.
That'd be a really fun movie to watch.
Extension of what can already be done with audio (Score:1)
Have a listen to this crude but effective splice up [obsess.com] of George W. done by Chris Morris.
It sacrifices any attempt at authenticity in favour of humour, but shows the idea well - getting someone to appear to be saying the total opposite of what they meant to say. With video added imagine how much more effective this could be.
RealOne? (Score:1)
Looks like you need RealOne to play these clips...
Why can't people encode video in someone which doesn't require system-hijacking software? Are there any other versions around?
Digital Animations Group (Score:2)
The Digital Animations Group (http://www.digital-animations.com/) have been doing computer generated characters very well for a couple of years. They are responsible for Ananova, the Talking Head and their latest creation the singing and dancing virtual pop star Tmmy (http://www.tmmy.co.uk), which BTW I submitted to slashdot but it was refused.
Whassup!! (Score:1)
Sweet. (Score:1)
Man, that would lead to some awesome fan films.
.....Marvin Mouse.....
(Math, CS, Physics, Psychology Undergrad)
5 Day Outlook Troll Forecast (Score:2, Funny)
Mr. Bungle rule! (Score:1)
Stealing a face... (Score:1, Insightful)
I don't think it would be that hard to gather data on someone's face, especially if your program was merely plotting facial movements as data - as one side of the face is more or less the same as the other, you can interpolate and mirror one side of the face if that's all you have. So you video tape from a concealed location, and play back that tape, with enhancements, etc, for your "thief" program. Plus, if you really needed a range of facial expressions, you simply set up the person you are "stealing" from:
A very gorgeous woman walks next to a guy ( the mark) and screetches suddenly - he cringes at first, then his face turns to a smile as the woman begins to rapidly utter things like "Oh my God! it's YOU! Do you remember me? It's been so long!" He utters a few words of "maybe" and "I'm not sure" until some big brute comes out and sees "his woman" hanging all over the guy, and roars this I'm-gonna-kill-you roar, at which point the mark cowers in fear for his life, and the brute and the girl just go away...
All of which is recorded on-the-sly. The actors involved don't even need to know why they are doing what they are doing, as long as they do it well - you've got your data, and you secret away back to your nefarious face-stealing lab, to sample this guys facial expressions, and create an image of him doing whatever... And thats only if he's NOT famous. Most famous people these days are all OVER video, thus more sampling material for you.
Clutch Cargo should sue (Score:1)