Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Technology

Digital Mouths, Synthetic Faces at MIT and Lucasfilm 150

jfengel writes "Two separate articles about generating faces automatically. From the Boston Globe, there is a story about MIT scientists putting words into somebody's mouth by splicing together footage. In the samples, I couldn't tell the difference between the synthetic footage and the same person really saying the same thing. (Though it's a little hard to tell at only 81kbps video). And Wired as a lengthy article about generating purely synthetic faces at Lucasfilm. It discusses some of the difficulties in getting it right."
This discussion has been archived. No new comments can be posted.

Digital Mouths, Synthetic Faces at MIT and Lucasfilm

Comments Filter:
  • by Fat Casper ( 260409 ) on Wednesday May 15, 2002 @10:15PM (#3527400) Homepage
    Forrest Gump and Max Headroom will be hosting a morning show starting next month.

    • Forrest Gump and Max Headroom will be hosting a morning show starting next month.

      "L-l-l-i-f-f-fe is l-l-like a box of ch-ch-ch-ocolates. You n-n-never kn-kn-know what you're gonna get-get-get."
  • Isn't this similar to what was done in Final Fantasy - The Movie?
    • The difference is that they take a video of a human rather than build the image up from wireframe. Basically, they can take a video of say President Bush and have him say stuff that he didn't say. In FF, it's obviously a cartoon image talking. It's easy to build a cartoon of a human but it is difficult to animate a real person that you can compare videos with.
      • Re:FF? (Score:3, Informative)

        by vitalidea ( 571366 )
        It's easy to build a cartoon of a human but it is difficult to animate a real person that you can compare videos with.

        Huh?! I work as a Sr. VFX guy, and CGI (Computer Generated Imaging) for facial animation is one of the most complex things to do!

        Basically, there are so many muscles in the face and so many nuances that it is very difficult to emulate a realistic face. Chris Landreth [imdb.com] is a director at Alias|wavefront with whom I had the "pleasure" of working with. His entire focus has mainly been with facial animation. And even with his talent, facial animation still doesn't look 100% realistic.

        Check out the book: Computer Facial Animation to get a glimpse at the mathematics, anatomy, and other technical hurdles being overcome in this arena.
        • That's my point. It is easy to make a cartoon look human-like (such as in FF). But you still know it's an animation. That's what I meant when I said that it is difficult to animate a real person (ie non-cartoon character). However, what they did was different. They took the original video and modified it.

      • Conan O'Brian has been doing this for years.
  • Sounds difficult. I guess it's a bit like Photoshopping video, rather than a still image. Kudos!
  • This is some really scary shit. Just think of the possibilities. Like my girlfriend asking for a threeway... hard to resist the temptations possible with this sort of technology isnt it? Maybe it should be banned, limited, etc. But, in order to do that, people would have to know about and care... oh, nevermind. we know we're the only ones that care about these really scary technologies. thats why the internet went bad, only us geeks know the dangers in this sort of thing, and who listens to us?
    • banning probably wouldn't work. When ArtificialLipManipulation is outlawed, only outlaws will have ArtificialLipManipulation.

      One futuristic countermeasure I can imagine would be for concerned citizens (e.g., politicians, dissidents) to have some type of device that cryptographically signs some aspect of their speech along with a trusted indicator of time. This thing would have to transmit a signal that would be embedded in any recorded media. Thus, verification of the digital signature of the audio and time hash would indicate whether the original recording was fabricated or tampered with. Doesn't really get around this technology (i.e., they only make it look like you're mouthing the audio, they don't deal with the audio), but it would prevent others from splicing or generating fake audio to accompany these phony video clips...

      Of course, there's only like, 50 zillion reasons this would be difficult/impossible to implement. But hey, I'm just the idea man...

      Of course, if they could extend this work beyond the lips and face, imagine what the porn industry could do...
    • Should photocopiers be banned because they make it easier to forge written documents? No, and neither should this. People are just going to have to get used to the fact that forging video is possible.
  • by CeZa ( 562197 )
    First client... SNL
  • I've been waiting for the ability to put together new movies by stars long since dead, possibly stars who weren't even contemporaries. I'm sure it will soon be possible, and this looks like they're heading in the right direction.

    The biggest hurdle I can see isn't technological, it'll be legal. Who really owns the rights to use the films made by famous people? It might be interesting to see just which ??AA lays claim to it first.

    • I definately have NOT been waiting for this. Have we lost all originality that we now must use dead actors to do our acting? Can't we find enough new actors and stars that we don't need to continue to cash in on the star power of old? Wouldn't a booming industry generating new movies with old stars say something about how our society values image over content. How the illusory is slowing replacing the real untill we no longer understand the difference and don't know why we should even care.

      If I ever become famous I am going to try damn hard to make sure I don't end up selling baby diapers from the great beyond.
      • I dont care who's in it so long as it's good
        • I really wish this were the opinion of the majority of the movie watching public. It came as rather a shock to me when I moved out of a smallish college town into generic town USA and found that most everyone here ranks who is in the movie as, or even more important than what the movie is about.

          People here are just as likley to say "Hey, you going to see that new (fill in the blank with popular actor) movie", as to actually say the movies name or premise.

          I think it's the same attitude of familiarity over quality in the general public that's kept microsoft on top for so long.
    • Hah... these MIT gurus think they have originality, huh? Well, I'll have you know that the guys behind South park have mastered the skill of matching voices to moving mouths [akamai.net] long ago.

      Damn Canadians and their flapping heads... and Saddam Hussein, too!
    • by texchanchan ( 471739 ) <ccrowley@@@gmail...com> on Thursday May 16, 2002 @12:01AM (#3527761)
      the year is 2095. the reviewer speaks:

      Let me begin by once again repeating the truism: no video whatsoever can match the scenes as they appear to your imagination during a simple, unaided reading of the three volumes of Tolkien's original text.

      With that out of the way, I will say that my own favorite among the video versions is the recent blockbuster edition, followed by the "Midlands" OSc 2072 dist (tuned 2,-1,4,0); and after that, the 2001-2003 movies using the Gibson/Taylor overlay. This review concentrates on videos; I will leave VRs for another day.

      There is no need, at this remove, to cite the failings of the Bakshi anime (1978) or Jackson's groundbreaking 2001-2003 live action movie.... However, when WWM re-released the "long" version on tab with a selection of overlays, including Mercer/Tran/Lopez and Gibson/Taylor, the movie was transformed from a mere classic to a paradigm of style. Its effect on a generation resembled the effect of the original books on the "Sixties Era" (roughly 1964-1972). The wildly popular M/T/L overlay, its unearthly beauty toning down the somewhat brutal original video, went straight to the heart of the virals.

      At the same time, the first underground OSc version, "OS-LOTR", was in process. Remember that this was before the Hurst case and copyright law was still in the postmillennial phase. Nevertheless, thousands of people participated. By any standard, the first version was pretty primitive. The base disappeared during Hurst. Only 18 snaps survive; ...[and they] show a wide range of competence. Some scenes, such as //this//, are nothing short of brilliant. However, I can't agree with those who believe that a large quantity of sublime art was lost. OSc was in its infancy, and the original consensualists tended to be technical personnel with vivid but unsophisticated imaginations. I have seen all 18 remaining snaps of OS-LOTR, and am convinced that nothing of value was lost to the Tolkienist or to the viewing public.

      The first legal OSc version ("OurRing") is also available at universities, but is not worth the casual viewer's time. The maintainers provided no guidance. Story elements of an unsavory nature, having nothing to do with the original books, found their way into the base. Tuning was in its infancy: OurRing provides only five settings in each of three dimensions. The project became overlarge, and never gained popularity outside a hobbyist community. It is of historical interest only, as is the short-lived "Bakshi", based on the anime, begun and closed within a year after OurRing.

      "Midlands", on the other hand, became a classic within weeks of startup. It derives most of its visual imagery and pacing from the centennial remake, but retains none of the bizarrer elements. A comparison of snaps is extremely revealing. The earliest still archived (two days in) is almost an exact copy of LOTR-100. In one week more, participation skyrocketed by 6000 percent, and the nine-day snap contains none at all of the odd politico-academic coloration. Note the gradients in this //graph// of the isologs: precipitous in the higher dimensions, almost flat in D1 through D5. Midlands is universally available and is the vehicle through which most young people first meet Tolkien. It is still maintained, although the classic version stabilized in 2072.

      Midlands is far more tunable than OurRing. The original tuner, which is part of the OSc v. 5.4 kernel, allowed for 15 dimensions. Addicts and purists apply the 500-dimension Gordon tuner. I have viewed several allegedly "perfectly" Gordon-tuned versions and could see no difference at all. These decimal-place variations invisible to anyone else fuel quite vitriolic disputes in the hobbyist community.

      "Zealand" and "Hildebrandt", Midlands' two nearest competitors, have a much smaller following. Zealand is of course based on the 2003 video. Hildebrandt is experimental; it combines OSc and overlay technologies. There is no dist--as the maintainer states in true twentieth-century fashion, it is intended to be a "work in progress", to be "as dynamic as the events it portrays". This can lead to surprises if you view over a period of days instead of capturing the whole thing at once. Its consos also tend to be outside the standard demo.

      Last year's remake is, in my opinion, the best of all. Yes, it condenses the story, but this is not a bad thing, as anyone will agree who has played one of the realtime VRs. Stern's directorial imagination could not possibly be closer to Tolkien's original vision. There is, of course, no truth to the rumor that he is a clone of Tolkien made for the purpose. ....
      • Whoa, I didn't understand a word of that. Can you explain it to me, it sounds interesting.
      • Wow, did you write that? That was an impressive bit of sci-fi prose. I found it fascinating and believable. I think the 'review' seemed very real to me largely because of the unfamiliar jargon and details interspersed in it. Reminds me of the entire slang language Anthony Burgess made up for A Clockwork Orange.
        • Thanks! Yeah, I wrote it after I saw the movie a few months ago and wished I could make a few slight changes.

          For Martyn S., here's the key--
          - Overlays: Computer-generated actors, or sets of actors, replacing the originals.
          - Tuners: Some kind of technology that allows you to set the amount of romance, scenery, violence, history, magic, humor, or other features (up to 500 with the Gordon tuner software) to your personal preference. Sort of like adjusting brightness/contrast/colors in an image file, on a conceptual level.
          - OSc is "open source creativity." It means that a lot of people modify the "base" video, under control of maintainers. These people are called consensualists or consos.
          - Snaps = snapshots of the what the video looks like at one point in time, because with OSc it's changing all the time.
          - Virals = nickname for a generation, like "flappers" or "hippies" is to us.
  • Using Poser, I found natural body movements fairly hard to create. the main difficulties I can see in getting facial expressions correct are simple: They have to be 'real'. Because everyone's face is different, the most accurate way to do faces is to'sample' a real face. Purely computer generated faces are not hard. The hard things are the TRANSITIONS between the expressions. These are extremely hard. Just ask the disney artists who did snow white. Moving from story-board to story-board is the hardest part. Computers have done a lot to help the transition problem. But sampling a real face is the best way to get things accurate so far.
  • great news! (Score:3, Funny)

    by larry bagina ( 561269 ) on Wednesday May 15, 2002 @10:23PM (#3527432) Journal
    this is great. Maybe the lip-syncing in Britney Spears' videos won't be so obvious.
  • The researchers have already begun testing the technology on video of Ted Koppel, anchor of ABC's ''Nightline,'' with the aim of dubbing a show in Spanish, according to Tony F. Ezzat, the graduate student who heads the MIT team. Yet as this and similar technology makes its way out of academic laboratories, even the scientists involved see ways it could be misused: to discredit political dissidents on television, to embarrass people with fabricated video posted on the Web, or to illegally use trusted figures to endorse products.

    This about somes it up for me....

    Although imagining Ted Kopel speaking in spanish is a riot.

    I remember being in europe some place, listening to the BBC for ten minutes on a shortwave radio, desperately trying to understand what the guy was saying through all of the static. It then occurred to me that the announcer was speaking in spanish in a really thick and proper british accent. The accent was so strong it threw me off, between the static, and everything.

    So I wonder if Koppel would even be understanding.

  • by Fat Casper ( 260409 ) on Wednesday May 15, 2002 @10:24PM (#3527442) Homepage
    The last thing we need is for the ethical arguments to shut down any of this public research. The uses of it are ethically scary, but I'd feel a lot better with MIT pushing forward with the research than any company doing it. The school will keep people updated on where they've gotten with it, and the world will be better able to judge how much to believe video. It'll be really interesting to see what constitutes proof in 20 years. If the research is done in the open, we might even still be able to believe in it.

    • It's easy to spot when a politician is lying - his lips are moving. boom boom!

      they may as well be spared having to turn up in person to read today's lies out.

    • Constant improvements in resolution and sound fidelity will probably outstrip the computer generated visual and auraul improvements for the next decade or two such that the average person on average equipment would easily be able to tell the difference between an actor and an animation. There are tons more issues than just getting the face right - though in staid newscast shots it'll be easier where the newscaster is looking into the camera. But when is the last time you saw a full newscast with only close up face shots?

      The problem isn't creating a realistic human animation - the problem is duplicating the realistic movements of an existing person. You couldn't just replace Dan Rather, for instance, since anyone who watches him infrequently would notice those little things that can't quite be animated yet. Even slicing together real video of the person is noticably different.

      -Adam
  • by Devil's BSD ( 562630 ) on Wednesday May 15, 2002 @10:25PM (#3527443) Homepage
    Read my lips: Strategerie means no new taxes. P-o-t-a-t-o-e.
  • SEE! R. Kelly was right. They really did fake those flicks of me and the shorties.
  • by kawaldeep ( 204184 ) on Wednesday May 15, 2002 @10:29PM (#3527458)

    henrik wann jensen is developing some of the most usable algorithms for skin and other translucent materials. He gave a talk last month at Cal as a prospective faculty member. It was fairly impressive.

    his home page [stanford.edu]

    rendering skin [stanford.edu]

    rendering smoke [stanford.edu]

  • haven't you all seen the maulibu stacy simpsons episode?...

    "Hello Smithers, You're Quite Good At Turning Me On."
  • Signing (Score:2, Interesting)

    Perhaps this will lead to greater adoption of digital signing?

    Not sure whether the President's speech is real or fake? Just see if he signed the authorised transmissions with his PGP key.
  • by Zergwyn ( 514693 ) on Wednesday May 15, 2002 @10:36PM (#3527492)
    I work with 3D design, and can certainly attest to the difficulty in mimicking people. The huge numbers of muscles and tiny details of morphology that make up a human face is a tremendously important part of making realism. However, ultimately a surface is needed, as it is, in the end, the light that is reflected back to our eyes. How real the surface looks is a required part of the equation, and some of the new advancements being made in rendering are quite exciting to me. For instance, many older raytracers only handle how light directly reflects off the surface of a texture. But in reality, things like human skin are not opaque, but are slightly translucent. The light passes into the skin, reflects off things like blood vessels, and exits again. Light also behaves in other interesting ways in certain situations. And some effects are simply dependent on computational power. Radiosity, for instance, can make scenes look much more realistic, but is too cycle-hungry to be used all the time in full-screen video. Being able to set these sorts of properties without having to program complex custom render modules for each movie will go a long way towards making artificial people more common.
  • Does this mean... (Score:3, Interesting)

    by jesser ( 77961 ) on Wednesday May 15, 2002 @10:41PM (#3527509) Homepage Journal
    we'll soon see a video of Dan Rather singing Rocked by Rape [min.net]?
    • That'd be hilarious, but with how much CBS freaked [npr.org] (warning: realmedia) about the song I doubt ECC would risk it.
    • I found a better, more direct link about the issue. Should have thought to look for their webpage before. Anyway, they have the letter [evolution-control.com] from CBS posted on their page and their reply [evolution-control.com] .(for those curious about what I meant in my other post)
  • by MikeLambert ( 309053 ) on Wednesday May 15, 2002 @10:41PM (#3527511)
    I don't think I'm special in this respect, but I didn't find the example clips that were given too hard to discern.

    Look for enunciation of certain latters such as P and M, and you should be able to tell the difference. The generated image gives a sense of moving the mouth but not enunciating the words clearly. Almost as if she is gliding over the words. With the real movie, however, you can see the woman completely changing her mouth formation to form the sounds required to pronounce the words.
    • I didn't look for specific letters, but I found that overall the real woman enunciated her words more, like you said. However, if one wasn't looking for it and couldn't compare the two clips I don't think it'd be possible to pick out the fake.

      For example, instead of the same person saying the same thing, two different people saying two different things would be very hard to tell which person was faking. This would probably get harder if one had an accent.
    • I had a hard time telling. Clip 2:A was the only one I could tell was fake, but by her eyebrows and blinking, and not her mouth. If I didn't know any of the clips were fake going into it, though, I never would have guessed.
    • I didn't look at words at all. There were very, very slight video "jumps" in the fakes: in the first one, it was about half way through, in the second it was right at the start. I don't think this was an artifact of RealPlayer, since the "real" shots were smooth.

      Not too hard to spot if you're looking for it, even without comparing the real and fake shots. However, give the technology another, ooo, 6 months, they'll get it to be unnoticable.
  • by edo-01 ( 241933 ) on Wednesday May 15, 2002 @10:44PM (#3527525)
    This reminds me of the novel Stainless Steel Rat for President by Harry Harrison. In it Slippery Jim DiGriz is rigging an election, and at one point cuts into the local news broadcast and replaces the newsreader with a digital version that reads the results he wants. It was written 10, 20 years ago? Seem almost prescient considering what happened in Florida in 2000 :-)

    Another, more benign use of the tech could be in entertainment. There was that episode of Star Trek: Deep Space Nine where they integrated the actors in with footage from the classic ep, Trouble With Tribbles. Great fun, but they were limited to using footage that exisited from the original series for intereacting with Kirk, Spock et al. Imagine being able to track Shatner's 60's face onto an actor and use this tech to lipsync 21st century Shatner's dialog. Best. Time Travel. Episode. Ever.

    And I don't even like Trek that much :-)

    • Speaking of Trek, there was a fairly recent episode of Enterprise where some odd aliens used spliced images of the captain to send a threatening message, because, I guess, they could not speak human themselves. (I don't even think they had mouths.)

      It was kind of a freaky affect.
    • Seem almost prescient considering what happened in Florida in 2000 :-)

      Sheesh, when will you democrats stop whining? It's like losing on penalties. If you can't score one more goal in over two hours of football, then you really can't complain about losing on penalites (even if it was a duff decision). BIG :-)


    • I don't really follow Trek that much, but I *loved* that episode (having for some reason also seen the original Trouble with Tribbles).

      I think that only being able to use the original footage was most of the fun: they had to think of clever ways to integrate the dialog, and movements, etc... Just think, if they'd have had free reign, it wouldn't have been nearly as good!
    • Great - so what we're saying now is that Bush's malapropisms and obvious discomfort oncamera are sure signs that he's actually a human? Vote for me, I sweat and stammer on camera!

      I thought the synthetic woman's delivery was very Gorelike, i.e. too wooden and perfect to be human.
  • how ppl, governments, and corperations could take a video of some like gwb or osama, and "change" the whole meaning of the video.... just think.... yea, troll me, i dont care
  • ...to learn of this breakthrough. Since the Child Pornography Prevention Act [slashdot.org] [slashdot.org] has been ruled unconstitutional and over-reaching, virtual oral sex given by computer-generated altar boys is still legal. I know that the Catholic Church has been throwing a lot of money at R&D on this so that priests can find release. Hurrah!
  • A new wave of those Elian Gonzalez doing WASSSSUPPPP videos, oh joy!
    Remember this? [disk-o.com]
  • ...the pr0n industry can't take advantage of?

    I'm sure personalized videos are just around the corner.

    -Ted

  • you just *know* the real DC is in a cryotank somewhere in Burbank...

  • From the Wired Article:

    "Of all the things I witness during my reporting, the one that most shakes my faith in the Cusan impossibility of fabricating synthetic souls ex nihilo is Hugo, an 18-second short created by the guys at ILM a few years back.

    "Hugo is an entirely synthetic creation - a phantasm of light and algorithm. A wrinkled figure with Spockian ears, heightened cheekbones, and a sunken chin, he gazes off to the side of the camera, stammering, "Me? What do you mean I'm not real? Oh, I see. This is a joke, right? You must be talking about the other one." He then gulps nervously and gives a forced smile."

    I HAVE to see that. We want Hugo ILM!
    • Well Hugo was an internal test. Then again they were showing it at their booth during the last SIGGRAPH. They would probably have it again this year at their booth. SIGGRAPH is a good way to catch some of their internal tests and bloopers.
  • I guess this kind of advances in technology will make video evidence inadmissible in the courts.
    Afterall, with sufficient CPU power, anybody could make anybody talk about anything!

    This will also mean that the court system will then ask for eyewitnesses since videos will not be admissible.
    I'm not sure whether this is good or bad.

    • guess this kind of advances in technology will make video evidence inadmissible in the courts. Afterall, with sufficient CPU power, anybody could make anybody talk about anything!
      I think what's going to happen is that video will only be admitted with a chain of custody such is done with certain other kinds of evidence culminating in some LEO on the witness stand under oath swearing that he knows of his own knowledge the conditions the video was taken under and that he knows that the video has not been subject to any form of digital manipulation.

      This probably should have been done years ago.I suspect that one can get comparable results manually NOW using frame-by-frame editing if enough time is put in on it by someone of the requisite skill. I believe the technology to handle the audio side of making people say things that would greatly surprise them was discussed here a while ago.

      Anyone who has access to the MIT setup who would like to speed this process up is invited to make a commercial of the Supremes endorsing goatse.cx as a wholesome place for children to go and get it onto the Net.

  • Hopefully they'll use this to have convincing (even more convincing!) video of Bill Clinton, George W. Bush and others saying funny thing. Now they have a still photo of him, and have like a hole cut out by his mouth with someone elses mouth there. It's funny already, but this would be really interesting.
  • I can really really see this being used in war. Yes I know, hand it to us humans to take something like this and make it a weapon for war, but anyways...

    Imagine Osama broadcasting on Afghani telivision to his troops to surrender to the nearest US platoon. I'm probably overestimating the stupidity of your average afghani al quaida member but chances are, you might get a good number of them to actually buy it, and surrender.

    Going even further, we could fake Osama's capture, have him broadcast to the country that america is a nice place and to quit being player haters. Yeah I know this all sounds far fetched but i'm sure the military would already be looking into this.
    • Going even further, we could fake Osama's capture, have him broadcast to the country that america is a nice place and to quit being player haters.


      Actually, when the infamous video of Osama taking credit for the 9/11 attacks was publicized, many in the Islamic world insisted that the US had faked the video to frame their man Osama, as a retroactive excuse to attack Afghanistan. (Kind of like many Americans, especially Afro-Americans, who still think OJ is innocent. If there's a history of your people being scapegoated by The Man, you'll be really reluctant to trust The Man even when he's right.) I can only see this mistrust getting worse if it becomes possible to really effectively fake a video like this.

  • I wonder when will the courts stop accepting videotaped material as evidence?
  • Not to worry (Score:3, Insightful)

    by jcsehak ( 559709 ) on Thursday May 16, 2002 @01:03AM (#3527925) Homepage
    I mean, this is pretty cool and all, but there's no reason to start worrying if someone's gonna put words in your mouth anytime soon. First they'd need:

    1. a few minutes of footage of you saying stuff that has the full range of mouth movements directly into a camera.

    2. an audio recording of you actually saying what it is that they want you to say. It's possible to cut and splice seperate recordings together, but 99% of the time, differences in the sound space would make it obvious that the recording was spliced together.

    And then after that, all they'd have is a video of you saying the thing and staring like a zombie into the camera.

    It's cool in theory, but I think Hollywood has done a lot better job at achieving better results.

    Mmm, Gummi Venus De Milo...
    • 2. an audio recording of you actually saying what it is that they want you to say.

      Synthesis of voice from some minimal sample (lexicon of syllables? Small enough to be compiled from public figures I'm sure) will be a reality.

      I'm sure the RIAA are modifying their contracts as we speak as to lay claim to all their artists vocal patterns, original or not.

      You thought Best Of albums were bad? Just wait till we have Frank Sinatra singing la'hits of Britney Spears.

    • Phew... and here I was nearly shitting bricks that some unknown suit is gonna put words in my mouth. Well... it's a good thing technology doesn't progress because then this might be scary in a few years. Thanks for putting my mind at ease.
  • First, we couldn't believe every thing we heard...

    Then, we couldn't believe everything we read...

    Now, we can't believe everything we see...

    I can't help but wonder what potential uses this could have. "Tonight at nine...Bill Gates admits Linux is superior to what he now refers to as 'Windoze'"
  • I actually got it right (before looking at the answers, even). :)

    What do the synthetic pictures have in common? Well, in both cases the woman moves her lips a bit less (the second) or does slightly less facial expressions (the first one).

    With this movie at low-quality post stamp size, I have my doubts regarding a full size TV newsreader. But I guess the technology is still in prototype stages and in a few years, we'll likely have synthetic newsreader indin.. indisti... indistinguishable from the real thing. But still probably far away from the same synthetic person actually performing some action more than talking.
  • Naturally, I figured, as I launched out on this story, these folks can't do faces because, as everybody knows, faces (as opposed to, say, bellies or thighs) are the Seat of the Soul, and souls simply aren't quantifiable, they don't resolve themselves into so many bits - no matter how many.

    When I first read this I thought he was joking. As I read further, I realized he was dead serious. Does anyone else find this highly ridiculous? I'm not suggesting that the concept of people having souls is ridiculous; I just think the idea of the presence or absence of one giving away a computer rendering is absurd.

    For anyone who feels the same way as the wired author, I propose the following hypothetical question: If some rendering was constructed (that is, produced algorithmically with the help of an artist) that was a truly perfect copy of a view of an actual person (i.e., every photon given off by either was matched), would a viewer be able to visually distinguish the two?

    If someone answers "Yes", then this becomes a matter of belief in supernatural powers and will not benefit from further discussion.

    If someone answers "No, but any rendering that could actually be created would be distinguishable from the human", I would give the following argument.

    First of all, I don't think the rendering of the actual surfaces involved is a point of contention. If believable "bellies or thighs" can be done, then we can adequately render the surfaces of the face as well. The issue is positioning those surfaces to create a convincingly human expression. What if the artists were to take photographs of the actual person and use points of reference on the person's face as control points to position the artificial model? (Of course, they already do this.) As more control points are used the model will become increasingly like the original. The wired author essentially addresses this very point with his analogy of approximating a circle using many-sided polygons:

    Keep adding sides - a hundred, a thousand, a million - and true, Cusa conceded, it seems like you'd be getting closer and closer to the encompassing circle. But in fact, he pointed out, you'd be getting farther and farther away, because a million-sided polygon has precisely that: a million angles, a million sides. Whereas a circle has no angles and no sides. It seemed to me that the face stalkers have set themselves a similarly impossible challenge, because a billion-bit face, no matter how seemingly close, was destined to fall infinitely short of the simple, seamless whole that is any actual face (and any actually human way of perceiving that face).

    This concept falls apart when you consider the content of the final phrase (in parentheses (heh-self describing)). While a face can be considered continuous, human vision is just as discreet as computer graphics. We have a finite number of rods and cones in our retina. The number of possible responses of those rods and cones to different intensities of light may be harder to quantify, but it is certainly true that given two light sources of increasingly similar brightness there exists a point at which they will be humanly indistinguishable. A rendering does not have to be actually perfect to be perfect as far as human vision is concerned.

    Anyway, my point is that the problem of creating believable computer representations of humans is a matter of engineering. It certainly is a very difficult problem, but I don't think you can reasonably claim it is insurmountable due to a computer's lack of a soul unless your argument is based on something like telepathy.
  • Do you wonder if there's a soul behind those synthetic faces? There sure is; it's Japanese. The Japanese are the most likely to perfect synthespians. They already got off to a rocky start with Final Fantasy.

    All the pieces are in place: their economy is terrible, they take cartoons seriously, and they envy Americans.

    A holy grail of Japanese animation is to look and sound exactly like an American live action movie. They could save their economy by replacing Hollywood actors with Tokyo animators. They could make movies their next great export, after cars and electronics. I think Americans won't lead the synthespian wave: We love our actors too much and we have little to gain. The Japanese don't love American actors (economically) and they have everything to gain.

    Final Fantasy's failure to profit has scared them, but they're already improving. They're learning how to write and act like Americans from Americans. That's what Square has done with Kingdom Hearts, translated by Disney and starring Haley Joel Osment. And the Metal Gear games, made in Japan and voice acted in USA, also sell well in USA.

    So I think the Japanese will do it. They need to.
  • Just think of the film possibilities in the future!

    When we consider Final Fantasy: The Movie, and contrast it to what should be viable within just 5 years from now, it boggles the mind.

    I, for one, would love to see a digital-quality old western film - but with both the Duke and Eastwood, not just one. Oh, and while we're at it, why not have Arnold Swartsenager (spelled wrong, I'm sure) be a henchman. And hell, throw "Han Solo" (Harrison Ford) in there as a local traveling trader, but in some western chaps. :) We could have Brad Pitt as the main bad guy (we all know he's crazy), and Sean Connery as the local sheriff... oh, and then pick any half dozen supermodels/really effing hot chicks for the town whores/barmaids.

    That'd be a really fun movie to watch. :)
  • This seems like a natural evolution of what can be done with audio.
    Have a listen to this crude but effective splice up [obsess.com] of George W. done by Chris Morris.
    It sacrifices any attempt at authenticity in favour of humour, but shows the idea well - getting someone to appear to be saying the total opposite of what they meant to say. With video added imagine how much more effective this could be.
  • Looks like you need RealOne to play these clips...

    Why can't people encode video in someone which doesn't require system-hijacking software? Are there any other versions around?


  • The Digital Animations Group (http://www.digital-animations.com/) have been doing computer generated characters very well for a couple of years. They are responsible for Ananova, the Talking Head and their latest creation the singing and dancing virtual pop star Tmmy (http://www.tmmy.co.uk), which BTW I submitted to slashdot but it was refused.

  • Didn't they do this already a few years ago with Elian Gonzales, Fidel Castro and Janet Reno?
  • Personally, I cannot wait until the day when someone takes all of these artificial acting programs, combines them with a quality artifical voice program and then I can make full length movies on my computer with any actor I want and my own plot lines.

    Man, that would lead to some awesome fan films. :-) As well as some... ahem... interesting ones. ;-)

    .....Marvin Mouse.....
    (Math, CS, Physics, Psychology Undergrad)

THEGODDESSOFTHENETHASTWISTINGFINGERSANDHERVOICEISLIKEAJAVELININTHENIGHTDUDE

Working...