Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?
Graphics Software

Video with Depth 110

Lifewolf writes: "A new technology from 3DV Systems uses pulsed infrared illumination to capture depth information for every pixel of a video stream. This allows for neat tricks like realtime keying without need for color backgrounds. JVC is already selling a product based on this, the ZCAM."
This discussion has been archived. No new comments can be posted.

Video with Depth

Comments Filter:
  • This opens up some great possibilities for
    digitizing 3D models. Anybody heard of this
    technology already being used for that?
    • There are already some optically based 3d scanners on the market. The first ones used a scanning laser beam to trace out a line that described an object's surface texture. More recent versions use a purely optical method (I think).

      This system could probably be used for modeling by placing a physical model on a turntable and recording its changing z-depth over time. I wonder how accurate it is at close range. This could be really useful for architects who want to develop a 3D site plan. Simply snap a few shots at the building site, construct a DXF file based on the depth information, and import it into your CAD software.

      The camera is probably intended for use with compositing applications like Shake, which can process z-depth information, as well as RGB, and alpha. Great for seamlessly integrating live action with computer generated 3D, particularly realtime 3D

      This also poses the question: what other types of useful information can a digital camera acquire, if we are not limited to the visual spectrum? Would it be possible to extract diffuse color, reflected color, transparency, or other "ray depth" information from real life subjects?
      • The camera is probably intended for use with compositing applications like Shake, which can process z-depth information, as well as RGB, and alpha. Great for seamlessly integrating live action with computer generated 3D, particularly realtime 3D

        This is kind of offtopic, but interesting nonetheless. Apple recently bought Nothing Real [], the company that makes Shake and Tremor.

        Can you say Final Cut Pro 4?

  • What's so difficult? (Score:2, Interesting)

    by evilviper ( 135110 )
    I've never really seen what makes 3D video (or 4D to get particular) so difficult to record.

    Humans have 2 eyes in the front of their heads, inches apart. All that is needed in a camera is for two syncronized tapes to run simultaneously, with the lenses just a few inches apart.

    Playback the left half on the left eye, the right half on the right eye, and our own built-in systems have no problem building those two images into a single 3D image.

    I think the difficulty is not in the recording of 3D information, but of building a display to play it back to multiple people.
    • You're mixing things up.

      This is not an attempt at 3D-video. This is video with depth information.

      It's primary application is to select parts of the image that you want to replace ('keying'), nothing else.
    • Part of it maybe that what you record stereoscoply you have to also playback with a stereoscope system (sorry for the spelling I am not english). Many system are/were tryed on various medium(blue/red glasses on TV, one frame over two on computer etc..). But every of those system have pro and contra (cost, quality, easy or nopt to use etc...). So in effect the problem looks easy but isn't (like the problem of path minimalisation, or even the "knot" problem). Furthermore I am not a biologue , but as far as I discussed with one there is another problem : eye aren't alone for us Human. The brain superpose a correction on what we see. Object it recognize it doesn't see them as "flat" even if seen with only one eye. It automatically add depth. Or something like that. Feel free to correct me as I am speaking out of my domain of expertise (Quantum Physic :))
      • Re:Twofold problem (Score:3, Interesting)

        by evilviper ( 135110 )
        Like I said, there is no problem recording the image in 3D. The problem seems to be playing it back to an audience easially.

        The brain superpose a correction on what we see. Object it recognize it doesn't see them as "flat" even if seen with only one eye. It automatically add depth.

        True, but what most people don't realize is that we see just as much depth in a TV screen, as we would in real life if we covered one eye.

        Speaking of complex problems... There are certain devices that, when placed over your eyes, will essentially trick your eyes into seeing the depth on a flat screen, so there is quite a lot of information saved on a 2D image. The strange thing is that computer generated images are still seen as flat, while the rest has depth. What is different in the two is a mystery, but it just goes to show that our minds are privy to much more information than we are consciously aware of. (Have you ever seen a movie which used special effects and it just didn't seem right, even through you couldn't point out any real problem?)
        • True, but what most people don't realize is that we see just as much depth in a TV screen, as we would in real life if we covered one eye.

          Remember, a strong queue for 3D perception does not require two eyes: Moving your head just slightly gives you stereo vision over time. Sometimes you can't get the same thing from a steadicam shot.
          • Yep. I'm monocular (due to surgery to correct crossed eyes), though I retain use of both of my eyes. (actually, I can even control which is my dominant/active eye, which allows me to perform rudimentary stereo checks, if only to amuse myself.)
            I do gain a lot of information from motion.
            At the same time, starfield simulations and the like (if done properly, refresh rate, etc) can really draw me in.
        • We really don't completely understand what you're talking about. It is true that with most people if you cover one eye you lose depth perception. But it doesn't have to be that way. A friend of mine is legally blind in one eye, he shouldn't have any depth perception, most people with his particular condition don't. Many years back he switched to a new optomotrist (sp?). When he went in for preliminary testing with this guy he got his depth perception tested, they had never done this at his first optomotrist's office, they assumed he didn't have any. The boy has perfect depth perception; he's one of the best tennis players in state. No one knows why and no one can offer any explanation other than his ONE working eye can do depth perception _by itself_. So it's a bit more complicated than anyone really knows. As a neurologist all I can tell you is that it's just another one of the many mysteries the brain presents us.
          • No one knows why and no one can offer any explanation other than his ONE working eye can do depth perception _by itself_.

            I find this quite interesting, but not hard to believe. His brain is probably using either change-of-perception (his own head moving around) or focus lengths of objects - or, more likely, combining the two.

            Back on topic - one way for a digital camera to get a Z coordinate is focal length. Cameras have had auto-focus features for years - why not run all the way through the focal range, and with a decent embedded DSP you should be able to pull together at least an estimated Z buffer for the whole frame. I figure the whole focal range should take less than a second, but what do I know?

            • Nikon's new cameras already do this. Nikon's D series lenses coupled with a 3D flash unit and D Series ready camera body use the distance to the focused subject to determine best flash output.
        • Re:Twofold problem (Score:2, Interesting)

          by arakis ( 315989 )
          May I correct some common misconceptions about 3-dimensional optics vs. stereoscopic. 3-Dimensional light is based on a wave of photons traveling through a volume of space. Outside of holography this wavefront of light is only achieveable in the real world. Stereoscopic images consist of seperate left and right images that when combined give the *illusion* of depth due to various parts of your brain that gauge distance, but not depth since they are based on a 2-dimensional sampling.

          It may seem that I am splitting hairs here, but I get very frustrated when people think that having one eye covered eliminates all depth perception. That is a catagorically wrong assertion since the retina in each eye occupies a three-dimensional space. People who have lost an eye encounter problems with depth preception, but do not lack the *ability* to precieve depth.

          If you pay close attention to any stereoscopic image, whether it is a "magic eye" or a viewmaster you will notice that things are collected into two-dimensional sheets that appear to have depth relative to eachother. A similar situation in real life would be if everything was either a backdrop or a cardboard cutout.

          By contrast the image displayed in a hologram presents an integral depth of the surface that is preceptible by a single human eye. It looks *real* becuase it is exactly the same 3-dimensional wavefront that existed when light was bouncing off the object to record the hologram.

          It is all a little confusing, but a little thought and casual observation will reveal these things to you. In my case I spent three-months interning in a holograpy studio in NYC, so I got to hear many interesting discussions on this and various other strange concepts of reality.

          So please peole, paralax does not mean the same thing as depth. If anything, please take that away from this thread.
    • They are not concerned with capturing depth data for broadcasting into 3d - they are building a system to automatically differentiate the subjects from the background to allow bluescreening and video effects to be applied in real time.

      A double camera system with a computer vision system would have difficulty picking out edges of subjects, and the resulting 'bluescreening' would be bodgey, at best. This is a relatively cheap, and simple solution.
    • I think they are going to use Holography ultimately, its just slow in coming to the mainstream .
      • Could you possibly be more vague?

        There are many ways to generate holographic images. The question is in the details. Will we see the same thing from any angle? Will a series of mirrors be used or just several lasers? How big will the picture really be?

        It's just as possible in the future we'll all just strap on somethng similar to the I-Glasses [] and individualize te experience.
        • I have an idea, but I'm not sure if it applies to this article or not.

          Why are there no holographic cameras? How about a personal photographic system that could take a 2D picture, along with depth information. Couldn't the lab then use that information to extract some semi-3D models as a basis for a hologram? (You know; one of those thin colour-banded holograms they put on CDs and credit cards..?) Or is the cost of making those holograms prohibitively high..?
          • Why are there no holographic cameras?

            Using normal holographic film you need monochromatic light to expose them (ie a laser) and exposure time is measured in seconds. Not very practical for a "camera".
            • I realize this, which is why I listed a method for taking the picture a more "normal" way, and then converting it to a hologram in the lab.

              Btw, what's the deal with the monochromatic light? I realize you have to use one colour or else you get a blurred resultant image, but is there any way to sort of do a component hologram and then put the parts back together? Sort of like how video has RGB..?
    • by Pemdas ( 33265 ) on Saturday February 09, 2002 @11:57AM (#2979228) Journal
      The concepts behind it aren't too difficult, a google search for epipolar geometry [] is a good place to start.

      The biggest problems are computational; it's hard to do a good job of stereo reconstruction at high frame rates in real time. It's by no means impossible, and there are commercial out there that do it, like this one [].

      Two cameras aren't really necessary, either, if your camera is moving in the scene. It's possible to recover both the movement of a camera and 3-d information about a scene just by moving a camera through it. Googling for structure from motion [] is a good place to start looking into those techniques, and there's a pretty cool page about one groups application here [].

      In short, this company may have an interesting prodect (depending on cost and more details on the error characteristics) but this isn't something that couldn't be done with existing methods.

      Also, as an aside, I find it interesting that they take a swipe at laser rangefinders as requiring a spinning mirror, when just about all IR cameras have a spinning "chopper" as an integral part of the exposure system...:)

    • Nah man, that just gives you minimal depth perspective, the data format is still 2d.

      It is the difference between having a 3d object and taking a front on picture of it and importing that picture into Adobe Illustrator (or Kilistrator, take your pick, now Kdraw or KVector isn't it? ) and using a "convert to paths" tool, which will get you a very nice 3d -looking- image but it will only store two dimensions for you, VS taking multiple shots of that object and importing them into an Application that calculated the 3d space of that object.

      Of course the advantage of what THIS camera does is that you get some 3d information without having to do a lot of REALLY nasty interpolation between multiple images. Granted modern techniques to do such have gotten better, but artificialy creating 3D data from 2D pictures of 3D objects, well. . . . heh. Even worse if those objects are "4D" (aka moving).

      This new camera seems to deal with moving objects just fine. Yah.

      The MAIN thing that I am thinking of this of is that you could possably translate objects around in your 3D space that was created by this camera.

      Your point of view would remain fixed and none of the objects could rotate (more on this latter) but you could still do some REALLY nice stuff in regards to Object Based Encoding.

      In fact the integration of 3D data into Object Based Video Encoding technologies could work to create for some VERY nice bit rates, or at least the removal of gobs of artifacts.

      Imagine if the Video Encoding KNEW that such and such person was going BEHIND that plant.

      Now of course one other use for this is that if you combined it with the pre-existing methods of using multiple cameras to capture a 3d space. With this method you could, mabye even after just creating an object outline in one viewpoint, (I will have to think over this particular facet of this new technology more in order to prove or disprove that idea) to rotate all the seperate OBJECTS within the scene, and not just move your view around the scene. (This is of course excluding any partialy obscured objects, which would likely have some strange things happen to them. :) )

      Because you have each objects X, Y, and Z coordinates, and your camera could have almost complete X, Y, and Z plane movements (remember, interpolated between multiple sources and your image quality when zooming in would be dependent upon your original image capture quality) you have yourself what is basicaly a fully workable 3d workspace.

      Imagine importing your video some day not into Adobe Premiere but rather into Maya or 3D Studio Max.

      Kick Ass.
    • IIRC, the depth information given by that technique is not very robust. It might be able to place one object behind or in front of another, but for any kind of precision you'd probably need an aperture speed that would make both images irresolvably blurry.

      Our eyes gather most of their depth information by focusing on a particular object, then doing some really cool neural-network trigonometry to measure the inward angle each eye is at to approximate the distance to the object. This could be done with cameras on extremely sensitive servos, but the information is only obtained for a single object, not everything else in the scene.
      • I think you've partially answered how a person with one eye can have depth perception. Instead of figuring out the angles and such that both eyes make to an object, just use the angle that one eye makes and add in the amount of muscular force the eye is using to focus. That muscular force is directly related to the curvature of the lens and thus the focal length.

        So the angle of the eye positions the object in 2D space and the curvature of the lens gives you the depth of the object in the center of your vision.


  • Fun to abuse... (Score:4, Insightful)

    by fleeb_fantastique ( 208912 ) <fleeb@ f l e e> on Saturday February 09, 2002 @07:20AM (#2978789) Homepage
    Can you imagine using this technology to insert your favorite politician in a porn video? George Bush Does Dallas.

    Used within a survellance camera, it could detect motion without getting tricked by that tree near the air vent.

    It could also be used in surgical situations where a specialist located in another state can more easily study facets of the video being provided to him (cutting out noise, if you will).

    You could do some really weird video editing where you could create a scene of a person standing in a verdant field in the middle of summer with snow falling within his 'mask'.

    Items recorded in this way (presuming the mask is also recorded) could perhaps be admissable evidence that helps the court focus on a specific action that might otherwise get missed.

    It might also provide a less-expensive way to make 3-D videos. Precursor to holographic movies?
    • Forger's wonder tool (Score:5, Interesting)

      by BlueUnderwear ( 73957 ) on Saturday February 09, 2002 @09:17AM (#2978920)
      ...admissable evidence that helps the court...

      IMHO, this technology would rather do the contrary. It makes photo forgeries so damn easy: no afternoon-long sessions with the gimp to get exact contours of people to delete from or insert into picutres: just use the ZCAM's distance keying and you get instant masks. The example given was scary: a business meeting, from which they could edit out people at will. The ideal tool for anybody that wants to rewrite history. So, forget about photos staying admissible as evidence in court.

    • If I remember correctly, there were 3D prOn
      flicks made during the 70's golden era
      (think "Boogie Nights"). I was told
      these films were pretty impressive on widescreen...maybe this technology
      will also bring a revival of those
      artistic explorations !

    • Re:Fun to abuse... (Score:2, Insightful)

      by andycat ( 139208 )
      It might also provide a less-expensive way to make 3-D videos. Precursor to holographic movies?

      It's a step along the way, but it's got one major drawback: it only captures a scene from one viewpoint. As soon as you move away from that viewpoint you're going to see holes in the scene where the camera didn't capture any information. To fix this, you must either (a) keep the viewpoint fixed at the camera's center of projection or (b) capture multiple views of the environment to fill in the missing bits.

      Cameras like this have another potential benefit: better video compression. There's a section of the MPEG-4 standard that provides for segmenting your scene into objects so you could, say, encode the weatherman separately from the backdrop he's waving his hands at. If you shoot with a camera like this that can give you a rough silhouette of major objects in the scene, you could spend more of your time doing high-quality encoding of the people running around in the foreground and less of your time on the background that doesn't change for the length of the shot.

      That said, I'm awfully skeptical about their claims of precision. As another poster has mentioned, there's a reason why laser range scanners cost so much: building an accurate rangefinder with lots of dynamic range is hard. As for object segmentation... I personally don't believe the image they provide as an example. Take a look at the depth map of the people at the conference table. In particular, look at the tabletop. It's nearly parallel to the camera axis, which means that its depth should be increasing fairly rapidly, which means you should see a gradient from light (near) to dark (far) in that part of the image -- but no, it's all one color.

      I suppose you can explain that as treating everything between depths D1 and D2 as a single object, but that doesn't work all that well in practice. What's far more likely in my opinion is that that object mask is a hand-created example rather than the actual output of the device.
  • I didn't have a depth thingy to tell me how to replace the image, we had blue backgrounds which had to be equally lit, and pray nobody came with blue on.
    The real reason blue was used is because if you see a video signal, it is only 11% of the signal, at most, and also a very rare color(saturation wise) in a picture. Most people don't wear blue tarp mascara, and it was acceptable.
    The other type of keying was on an Amiga with a Gen Lock, using background color as the transparency, a static image over a live background. You could also set the transparency, so you could get ghost-like effects.
    But with one of these, you can probably make a scrolling background with the occasional tree popping to front. If you were to do the same with an editing suite, you're looking at at least a good hour, and when you rent out facilities, you look for all the helpies you can. Just printing out a still from video can cost more if you're using a "video printer".
    I wonder if you can set the depth manually, or if it's hard coded. It might be fun to see something pass "through" something else.
  • by arsaspe ( 539022 ) on Saturday February 09, 2002 @07:36AM (#2978808)
    Normally, when you want to key in a false background in a scene, you need to have a constant color in the background (Hence the use of blue and green screens). If the background isn't flat, then you either have to go at it with photoshop frame by frame, or use expensive border tracking software which is less than perfect. You could spend hours setting up a scene just right, with screens placed in all the right places, making sure that there is nothing else that is the same color as the key, and planning camera angles for an action sequence, not to mention the struggle of getting the keying to work just right.

    with this new technology, however, you could film an actor just about anywhere with very little preperation, and key him/her out based on depth AND color (some situations may need both), and easily pop new things both in front and behind the actor. It could save movie studios a lot of time, effort, and money for doing special effects, especially after you consider how easily it would be to generate a virtual stunt double from the 3d mesh (film the actor from a few angles, and merge the resulting 3d wireframe. Voila, perfect model down to the wrinkles in the skin)
    • especially after you consider how easily it would be to generate a virtual stunt double from the 3d mesh (film the actor from a few angles, and merge the resulting 3d wireframe. Voila, perfect model down to the wrinkles in the skin)

      Uh, no... I wish it were that easy - but scanned 3D meshes of that quality are still in the domain of laser scanning. There's just so much detail that even the best scanners can't pick up, major wrinkles and folds yes but pores and fine lines have to be simulated with displacement/bump and colour maps derived from the scan data (basically as it scans, the device takes a big long photo of the object to wrap around it later). Once you have the point-cloud from the scan (raw data) there is a LOT of cleaning up to do to get a parametric mesh with correct UVs (texture mapping co-ordinates) for use in production.

      For more info, check these guys [] out - we've used em recently on a couple of film and tv projects and their output is damn nice, but the price tag reflects the complexity and difficulty of the task.

      • damn. I hit 'post' then I think of something else to say... the depth info may not be good enough to generate a nice detailed mesh of an actor or a set etc, but lets say it can give at least a coarse mesh for each frame of a shot, you could use the low res 3D info for things like shadow passes for CG elements (you comp in a flying robot going past your real actor, you use the depthcam generated mesh info for those frames to have the robot's shadow slide over the actor's body correctly. If the actor's costume had reflective surfaces like goggles you can use the 3D info to have the robot reflected in them - it may sound subtle but it's the subtle things that tie a shot together)
  • 3dstudio 4 has a plugin to render z buffer depth too to get scenes like the one's with this camera

    it's great for doing depth based effects such as artificial depth of field (3ds4 didn't have that)

    I'd love to have one of the cameras available for making live video stuff, I'm looking forward to getting my hands on one, I hope my local video facilities unit gets one (I'm going to mail them a link).

    Coming soon to an MTV near you. Sadly probably not from my studio any more. I gave that up when 3dsMax came out, Seemed like there was no room left for a two man outfit (one gfx, one coder).

  • Video recorded with this technology will give you two video streams:

    * The normal video-stream that any video-camera will give you.

    * Another video-stream containing depth information.

    So, what you have, at best, is a way to tell the relative distance from the camera to each point in the image. Which, will let you adress seperate elements of the image based on depth. But, you _won't_ have anything more image-wise than you can record at home with your Sony.

    Sorry, no 3D-porn.
    • Sorry, no 3D-porn.

      Couldn't you take the image streams, do the red/blue-shift thing based on the depth stream (or better yet, disneyland-style polarization, if you've got the playback gear), and there you go, 3D-porn :)

      Benjamin Coates
    • by StarBar ( 549337 )
      Depth information and movement can give a chance to triangulate objects targeted.
      From there you probably can move on to the more sophisticated compression techniques
      (soon to be) intruduced my MPEG-4.

      Ever seen the move "Enemy of the state" where they triangulate 3D shapes with satellites
      and movements? Great techniques in that movie, but scary scenario.
      • OK, I find compression interesting... I think that you could use this to compress a 3D video stream, by essentially "seeing" each object as a seperate stream of data in the image and compressing each seperately.

        You might be able to actually generate a 360 degree view of the background and encode the distance and angle of the view in each scene, then place the seperate actors into the scene.

        The really cool thing about this technique is that it would make it easy to delete or replace any one object in a scene in a video.
        • I think what you describe is very close to what they are trying to do with MPEG-4 animation extensions. Doing this with live content is very exciting. A movie will not be a series of pictures but rather a scenario with a number of pre defined view paths which the viewer can choose between. The same goes for fully animated movies like "Final Fantasy" and "Toy Story". In either case the size of what is broadcasted shrinks dramatically.
  • by TheFlu ( 213162 ) on Saturday February 09, 2002 @08:16AM (#2978849) Homepage
    "Once you capture live action footage in object video format, you can not only make it more visually engaging, but also sell advertising right in context of the live event."

    Great, now you won't be able to distinguish between the show you're watching and the advertisement. Now when I'm watching TechTV, I can look forward to Britney Spears bouncing thru with a Pepsi at 30 second intervals.
  • with this technology, NASA could fake the moon/mars landings again, and this time - get it right! rofl

    You could do lots of interesting tricks with this - like changing the cut-off on the z-buffer, so when someone walks away from the camera, it looks like they're walking through a wall.
    • I bet this time they won't hire OJ Simpson to fake the Mars Landing.

      (MODS- It's an obscure joke, just because you don't get it doesn't mean it's offtopic)
  • Just amazing how DV cameras just keep getting smaller and smaller. I think I'll pick up that ZCAM, and get the optional belt case, so it's with me everywhere I go :-)

    I guess this thing is targeted more for reporters and the media, than the consumer.

    I assume "keying" is what we dumb consumers typically know as "blue screening" or "green screening", but this lets you do the same without a solid background, since it can separate out the people in the foreground using a depth cutoff instead.

    Neat technology. I think there'll be more practical uses for this than you might think at first.

    I wonder how accurately the z layer aligns with the pixels. Since it's a different infrared source, bounced off the subjects, I wonder if there's some fancy alignment that has to be done, or if the same pixels on the camera pick up the depth information. It'd be the difference between perfect alignment, and having sloppy edges around objects, which is pretty significant for a lot of uses.

    • Just amazing how DV cameras just keep getting smaller and smaller. I think I'll pick up that ZCAM, and get the optional belt case, so it's with me everywhere I go :-)

      The "video-camera-in-a-match-head" phenomenon is pretty much exclusive to consumer gear. A good professional video camera should be at least two feet long. :)

      I assume "keying" is what we dumb consumers typically know as "blue screening" or "green screening"

      No, what you know as "blue screening" is technically known as "system crash". :)

      The real technical terms are "Chroma-key" (make pixels with a certain colour transparent) and "Luma-key" (make pixels with a certain brightness transparent) Most guy-in-front-of-unusual-situation stuff (like, say, a weatherman) is done with chromakey.

      • Pros *are* using tiny cameras, and mics as well.

        There was an article in EQ magazine about working the X-games in Philly. Helmet-cams and matchbook-size "button mics" abound.

        It doesn't appear to be on their site []

        Not mentinoned in the article, but another cool thing they have at the X-games is a camera suspended above the half-pipe by three (four?) cables connected to computer-controlled high-speed winches. It can "fly" over the skaters' heads at amazing speed, dropping down into the pipe to look up at them on the vert ramp, and then zooming up out of the way.

    • I assume "keying" is what we dumb consumers typically know as "blue screening" or "green screening"

      That is, unless they use Windows 9x regularly ;-)

      but this lets you do the same without a solid background

      Actually, use of a still (non-solid) background would help even this technique, as post-processing can massage the background vs. foreground using traditional MPEG motion compensation for an even more accurate contrast between a background moving in one direction and a subject moving in the other.

  • Still? (Score:2, Interesting)

    by BCoates ( 512464 )
    Would it be possible to economically do this with still cameras(preferrably film vs. digital)? Are there already products that do that? It would be cool to be able to record a depth 'image' with my photograps for later editing...

    Benjamin Coates
    • Super idea! It would make editing in Photoshop or GIMP super fun. They might have to chnage the programs to take advantage of the new feature. Maybe a new file format too?
      • 3DS Max already has a (half-open) file format for that - it's called Rich Pixel Format (RPF) - you can find a reader for it in the "Io"-directory of the MAX SDK (gz at
  • Getting real-time depth information from the amount of IR reflected from a pulsed IR light is a pretty old technique. It's used in some input devices to detect where people are in front of the computer. The use of this information for video keying may be new, though.
    • The technique is old, but doing it per-pixel is very cool. Now all that needs to be done is to write a 5 channel video format (RGBAZ) and I can start writing software that uses this for unrealistic things. Ohh, the possibilities...
  • Sounds like an interesting technology that'll make for some pretty cool effects/uses.

    Apart from the obvious use getting virtual objects to pass correctly between/around objects in the real scene, or vice versa you've freed up the colour channel info being used as a depth key for other things.

    Imagine keying an actors and his or her clothing in blue and using the depth keying to to replace the blue with a projected texture or somesuch using the depth information to do the texture calculations, or keying sports equipment in sports broadcasts.

    Or if the technology eventually scales down to an affordable level it might make an interesting input device for playing video games.
  • by undertoad ( 104182 ) on Saturday February 09, 2002 @10:25AM (#2979026) Homepage
    it's called a dumb terminal.

    Thank you.
  • The use of this camera technology for video composition is great, but if you bundle a panoramic (360 degree) camera with it, you solve the reason that accurate 3D visual reconstructions are expensive. I'm thinking: export a 3D map of every object in range, then feed that into CAD.

    Now take your CAD file, recompile and render with a Quake3 engine, apply sampled textures, and you've got a very cheap, fast, good 3D walkthrough - architects will enjoy this too, as will tourism sites.

    It's also going to mean some great first-person-shooter maps :P
    • Or you could just use the piece that makes most of those 360 degree cams work. That nice little bowl shapped mirror atachment that goes just in front of the lens. the little convex mirror will produce an excelent 360 degree panorama.
    • Yep! Great idea. Million dollar one too, which means that it is hard.

      The camera produces a 3D point cloud, from which geometry (CAD) does not fall out of naturally.

  • Visual Effects work (Score:2, Interesting)

    by edo-01 ( 241933 )

    I posted a comment [] a while ago that explained the uses in visual effects work for depth-cameras, and some of the problems with existing methods of pulling a matte off of live action plates...

    We were actually talking about this at work the other day; mainly wondering how well it would deal with things like fine hair, smoke, transparent objects and stuff like film grain/video artifacts/lens artifacts etc...

    Would love to try one and find out...

  • The ZCAM Videocam extension is available for more than half a year now.
    That fact that it actually works as advertised is somewhat astonishing. If there's a large enough distance between fore and background (> 1,5 Meters) it Keys without any hassle. No more Blue or Greenscreens, that means.
  • Hair? Glass? (Score:2, Informative)

    by Anonymous Coward
    The biggest problems in color keying are Hair and glass (as in eyeglasses).

    If this system, as it claims is simply making a z-buffer (depth buffer) of the image, then it's going to see hair and glass as a opaque lump, not the semi-transparent reality.

    Blue and Green screening (not chroma keying) can do a very good job of pulling out variable opacity and thin items like hair. Especially with the newer LED screen illumination camera rings.

    This technology has some nifty tricks and will allow more poor quality keying to continue, but it won't replace blue and green screens.
  • by William Tanksley ( 1752 ) on Saturday February 09, 2002 @02:19PM (#2979624)
    I don't believe nobody has posted about MPEG4. This is very interesting for that -- film using this, and you can encode into MPEG4 format with /huge/ compression almost automatically. The hard part about MPEG4 is object detection; this makes that almost free.

    • You're absolutely right - this will make a huge difference for compressed video by separating out the layers of the image. Motion prediction (or rather background prediction) will become trivial. The potential for this goes well beyond the existing MPEG4 codecs - indeed I expect it to spawn a whole new generation of codecs based on RGBD colorspace. Not only that, it will allow you to easily build up a detailed 3 dimensional representation of the static objects in your video, which is a whole new technological potential.

  • Why bother. A vertical split-screen image for left and right eye is all you need. Theres nothing stopping conventional television from broadcasting stereoscopic images. Get two camcorders, tape em together at the sides and videotape stuff in your house. Edit the video so that the left camera's image displays on the right-hand side of the screen, and vice versa. Bingo, 3D video.

    See what I mean? []

    • Why bother. A vertical split-screen image for left and right eye is all you need. Theres nothing stopping conventional television from broadcasting stereoscopic images. Get two camcorders, tape em together at the sides and videotape stuff in your house. Edit the video so that the left camera's image displays on the right-hand side of the screen, and vice versa. Bingo, 3D video.
      But this isn't just about presenting the 3d effect of video (in fact, they don't talk about that at all, from what I can see. It's more useful for clipping your live video images in real time to different depths (only keep the people up close, ignore the background).

      In *theory* you could do this with two cameras, and some amazing processing that compares the two images, extracting the depth information for each pixel. But if such software even exists (and I think it might, for leading edge 3D scanning techniques), there's no way it could be done in real time, like the ZCAM does.

  • NAB is the National Association of Broadcasters conference. The ZCAM was was being demoed then.

    In the demos they had realtime keying so they could fly a 3D CGI character in front of and behind the live talent. There was only about a 40ms delay. This is impossible with normal keying (ie blue/green screen). (You can only put stuff behind the talent).

    It's biggest limitation was the resolution of the 3D sensor was low - so you had rough edges (think jaggies).

    They also demonstrated a 3D Realplayer and 3D Windows Media players (which you watched with stereo shutter glasses). These players were called 'deep players'. Pretty cool but definitely not new.

    • It's biggest limitation was the resolution of the 3D sensor was low - so you had rough edges (think jaggies).

      Are you sure this was the problem?? I've been wondering how well this technology would actually work (it was announced quite some time ago), and have heard that indeed it had "jaggies".

      Though I was under the impression it was due to the inherent problem with anti-aliasing z-depth based composites.... the depth is represented as grayscale, from white (nearest) to black (furthest). If you were to antialias a foreground subject (say it's white) onto a black background, you'd end up with various shades of gray pixels along the foreground object's edge.... this would translate to the edges of the foreground object being at a distance between foreground and background, which is obviously inaccurate, as you're still deaing with the foreground object.

      • From what I saw it wasn't an antialiasing problem in the traditional sense. It was an aliasing problem in that the depth sensor they use has a low spatial resolution.

        The demo I saw that was live used the depth as a key - but it wasn't used to blend the objects together, it was used to make a simple visible/not visible decision. What you are describing is using the depth key to vary alpha which would be interesting to look at.

        One VERY cool demo they did was to fly a CG character through the arms of the live talent. Where the CG object and the real object intersected you could clearly see blocky edges.

        At the time I thought the technology (at least in a live setting) was maybe good enough for a young kids show were you want high speed production and the production values don't matter too much. Once they get the depth sensor up to broadcast the resolution however it will be a VERY nice live tool. (Maybe they have already done this - I haven't seen it in two years and they must have improved).

  • O2&Sect2=HITOFF&p=1&u=/netahtml/search-bool.html&r =3&f=G&l=50&co1=AND&d=ft00&s1=3DV.ASNM.&OS=AN/3DV& RS=AN/3DV

    Looks rather simple, akin to simple range gating.

MESSAGE ACKNOWLEDGED -- The Pershing II missiles have been launched.