Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Graphics Software

Pencigraphy: Image Composites from Video 157

jafuser writes: "Prof. Steve Mann (of cyborg fame) has a detailed technical description on his site that demonstrates a method of transforming video into a high resolution composite image. Pictures are seamlessly mosaiced together to form one larger picture of the scene. Portions of the video that were "zoomed in" will result in a much clearer region in the final picture. I wonder if this could be used in a linear sequence to 'restore' old video to higher resolutions? It's on sourceforge; download and play!" Mann has been experimenting with such composites using personal video cameras for years.
This discussion has been archived. No new comments can be posted.

Pencigraphy: Image Composites from Video

Comments Filter:
  • by krog ( 25663 ) on Thursday July 25, 2002 @01:54PM (#3952482) Homepage
    Video Orbits of the Projective Group:
    A New Perspective on Image Compositing.
    Steve Mann
    Abstract
    A new technique has been developed for estimating the projective (homographic) coordinate transformation between pairs of images of a static scene, taken with a camera that is free to pan, tilt, rotate about its optical axis, and zoom. The technique solves the problem for two cases:

    * images taken from the same location of an arbitrary 3-D scene, or
    * images taken from arbitrary locations of a flat scene.

    The technique, first published in 1993,

    @INPROCEEDINGS{mannist,
    AUTHOR = "S. Mann",
    TITLE = "Compositing Multiple Pictures of the Same Scene",
    Organization = {The Society of Imaging Science and Technology},
    BOOKTITLE = {Proceedings of the 46th Annual {IS\&T} Conference},
    Address = {Cambridge, Massachusetts},
    Month = {May 9-14},
    pages = "50--52",
    note = "ISBN: 0-89208-171-6",
    YEAR = {1993}
    }

    has recently been published in more detail in:

    @techreport{manntip,
    author = "S. Mann and R. W. Picard",
    title = "Video orbits of the projective group;
    A simple approach to featureless estimation of parameters",
    institution = "Massachusetts Institute of Technology",
    type = "TR",
    number = "338",
    address = "Cambridge, Ma",
    month = "See http://n1nlf-1.eecg.toronto.edu/tip.ps.gz",
    note = "Also appears {IEEE} Trans. Image Proc., Sept 1997, Vol. 6 No. 9",
    year = 1995}

    (The aspect of the 1993 paper dealing with differently exposed pictures to appear in a later Proc IEEE paper; please contact author of this WWW page if you're interested in knowing more about extending dynamic range by combining differently exposed pictures, or getting a preprint.)

    A pdf file of the above publication, as it originally appeared, with the original pagination, etc., is also available.

    The new algorithm is applied to the task of constructing high resolution still images from video. This approach generalizes inter-frame camera motion estimation methods which have previously used an affine model and/or which have relied upon finding points of correspondence between the image frames.

    The new method, which allows an image to be created by ``painting with video'' is used in conjunction with a wearable wireless webcam, so that image mosaics can be generated simply by looking around, in a sense, ``painting with looks''.
    Introduction
    Combining multiple pictures of the same static scene allows for a higher ``resolution'' image to be constructed.: Example of image composite from IS&T 1993 paper (click to see higher resolution version). In the above example, the spatial extent of the image is increased by panning the camera while mosaicing and the spatial resolution is increased by zooming the camera and by combining overlapping frames from different viewpoints.

    Note that the author overran the panning to appear twice in the composite picture (this is an old trick dating back to the days of the 1904 Kodak circuit 10 camera which is still used to take the freshman portraits in Killian court, and there are several people who still overrun the camera to get in the picture twice). Note also that the author appears sharper on the right than on the left because of the zooming in (``saliency'') at that region of the image.

    Note also that, unlike previous methods based on the affine model, the inserts are not parallelogram-shaped (e.g. not affine), because a projective (homographic) coordinate transformation is used here rather than the affine coordinate transformation.

    The difference between the affine model and the projective model is evident in the following figure:

    For completeness, other coordinate transformations, such as bilinear and pseudo-perspective, are also shown. Note that the models are presented in two categories, models that exhibit the ``chirping'' effect, and those that do not.
    Examples

    1. Extreme wide-angle architectural shot. A wide-sweeping panorama is presented in a distortion-free form (e.g. where straight lines map to straight lines).
    2. My point of view at Wal-Mart Click for medium-resolution greyscale image; a somewhat higher resolution image is available here; a much higher resolution version of this same picture, in either 192 bit color (double) or 24 bit color (uchar), is available upon request).
    3. ``Claire'' image sequence Paul Hubel aims a hand-held video camera at his wife. Although the scene is not completely static and there is no constraint to keep the camera center of projection (COP) fixed, the algorithm produces a reasonable composite image.
    4. An ``environment map'' of the Media Lab's ``computer garden''.
    5. Head-mounted camera at a restaurant
    6. Outdoor scene with people, close-up (Alan Alda interviewing me for Sci.Am "FRONTIERS").
    7. National geographic visit

    See a gallery of quantigraphic image composites
    Obtain (download) latest version of VideoOrbits freesource from sourceforge
    or if you can take a look at an older version, (download of old version) or if you don't want to obtain the whole tar file, you can take a look at the README of the old version. bugs, bug reports, suggestions for features, etc. to: mann@eecg.toronto.edu, fungja@eyetap.org, corey@eyetap.org
    My original Matlab files upon which the C version of orbits is based (these in-turn were based on my PV-Wave and FORTRAN code)
    For more info on orbits, see chapter 6 of the textbook. Steve's personal Web page
    List of publications
    • Not in real-time. (Score:5, Informative)

      by Christopher Thomas ( 11717 ) on Thursday July 25, 2002 @02:05PM (#3952550)
      The new method, which allows an image to be created by ``painting with video'' is used in conjunction with a wearable wireless webcam, so that image mosaics can be generated simply by looking around, in a sense, ``painting with looks''.

      Just in case anyone was wondering - this wasn't being done in anything close to real-time the last time I checked. There's a cluster in Prof. Mann's lab which is dedicated to compositing these images (my cube is in the next room).

      Still an interesting project. The affine transformation approach has been well-understood for some time (you do a brute force and ignorance test of promising-looking affine transformations [rotations and scalings] to find one that matches the new image to the old). As far as I can tell, he's doing the same thing with a different coordinate system that has a bit less distortion.
      • "The new method, which allows an image to be created by 'painting with video' is used in conjunction with a wearable wireless webcam, so that image mosaics can be generated simply by looking around, in a sense, 'painting with looks'"

        Reminds me of a technique that was used to photograph large airplanes in a hanger with a limited number of light sources. The lights were turned out, the shutter of the camera was opened, and the technicians would "paint with light," illuminating various parts of the airplane in sequence. This from someone who worked in the photography department of a large aircraft manufacturer.
        • Actually, the light-painting technique is quite different.

          It's just selective illumination of different parts of the image, usually for artistic effects. (the same type of artistry used in Start Trek TOS, resulting in fuzzy women :)

          There was (is?) a product called the Hosemaster which was intended for this.
          • The Hosemaster spun a set of filters sequentially in front of the lens and would set off a different flash for each filter. By lighting different parts of a scene with different flashes, and using different filters with each flash, the photographer could effectively apply different filters to different objects in the scene. For instance you could have two people standing next to each other with one of them shot through a diffusion (fuzzy) filter and the other person sharp. In the late eighties there was a cliché to do portraits with a diffused backlight and the rest sharp. That's how all those pictures of CEOs standing in a murky foggy environment were done.

            There is also a fiber optic bundle light source involved, hence the "hose" in the name.

            I used to do lots of time exposures outdoors at night using hundreds of flashes from a small vivitar 283 strobe to illuminate things

            I'm playing with something similar now using a digital camera
      • affine transformations [rotations and scalings]

        Actually, a combination of rotation, scaling, translation, and shearing.

        Algebraically (and more precisely) (and more pedantically), in 2-D:

        X=A1*X1+B1*Y1+C1
        Y=A2*X1+B2*Y2+C2

        I'd try to show it as matrix arithmetic but the lameness filter won't let me. More evidence that being a math geek (or even a former one) is lame.

      • Re:Not in real-time. (Score:2, Interesting)

        by SWPadnos ( 191329 )
        There is a company called Titan Technologies which does this in real-time.

        I saw them at the last Embedded Systems Conference in San Francisco (although I think the actual group doing this stuff was from another company or a subsidiary of Titan). They were displaying a couple of very interesting systems. Both were based on a custom chip that took video streams in, and output the rotation, scaling, and translation factors needed to match up successive frames.

        The first application was a scene painter/mosaic tool, which worked in real time. The other was a "video stabilizer/sharpener", which allowed you to stabilize jerky video, and composite successive frames together for increasing sharpness. The demo was based on videotape from a digital video camera, taken from a car, and stabilizing video of a truck they were following. It was quite jerky before processing (again, in real time), and you could easily read the license plate in the stabilized image.

        It was *very* cool.
  • All the video footage I have is only close-ups with no camera movements to speak of.
  • Restoring old video (Score:5, Interesting)

    by Blind Linux ( 593315 ) on Thursday July 25, 2002 @01:57PM (#3952501) Journal
    Juding from the description found in that article, I believe that it is possible to enhance old video to higher qualities. However, the quality of color sometimes cannot be enhanced no matter what. Unless one has access to the original film reel, it is unlikely that any sort of improvements could be made; video copies are utterly useless in this manner. Anything from before 1990 in VHS is much worse quality, case in point being the John Woo film A Better Tomorrow. The problem with these videos is that not only is the quality blurry, but the color blending is off and sometimes exceeds the lines it should, creating distorted images. I've seen this in a lot of older movies... I wonder if there's a way to correct this.
    At any rate this looks very promising indeed... it'd be cool to see some of the old classics in better quality. :)
    • Juding from the description found in that article, I believe that it is possible to enhance old video to higher qualities.

      What in the description led you to this bizarre conclusion? The technique described is for seamlessly joining the output of multiple video cameras into a combined image with higher resolution than a single camera could manage if exposed to the same complete scene. Unless you know of a cache of old films which were shot at multiple partially overlapping angles but taken from the same primary vantage point, the system wouldn't seem to apply at all.

      • Juding from the description found in that article, I believe that it is possible to enhance old video to higher qualities.
        What in the description led you to this bizarre conclusion?
        Perhaps it was this:
        A new technique has been developed for estimating the projective (homographic) coordinate transformation between pairs of images of a static scene, taken with
        a camera that is free to pan, tilt, rotate about its optical axis, and zoom. The technique solves the problem for two cases:
        • images taken from the same location of an arbitrary 3-D scene, or
        • [snip]
        [my emphasis]. It talks about combining multiple frames from one camera to create higher resolution where they overlap. Perhaps you need to read the article.
        • It talks about combining multiple frames from one camera to create higher resolution where they overlap. Perhaps you need to read the article.

          If the camera is pointed only in one direction then you don't need an isomorphic mapping and knitting algorithm to clean up the signal, you can do it by directly overmapping sucessive frames. Again, this technique has nothing to do with cleaning up old video. I suppose if you had a movie that contained a slow pan of an area then you could map successive frames to get a wide angle still image of the same area, but that still wouldn't have anything to do with cleaning up video since the final output would be a still image.

          To get improved video you would have to have continuously running frames from multiple angles, all shot from the same vantage point (or the same location if you prefer), just as I originally wrote.

      • Unless you know of a cache of old films which were shot at multiple partially overlapping angles but taken from the same primary vantage point, the system wouldn't seem to apply at all.

        Well, NTSC video has 29.97 frames per second (w/ 2 alternating "fields" per frame). So when the camera is held steady, that's about 30 sample exposures of a particular angle.

        The problem with these videos is that not only is the quality blurry, but the color blending is off and sometimes exceeds the lines it should, creating distorted images. I've seen this in a lot of older movies... I wonder if there's a way to correct this.

        Many sitcoms and old movies and such which are now presented on video were originally recorded on film. In fact, this is generally the case with most "upscale" shows today. In theory, shows like Seinfeld or I Love Lucy (the first sitcom to pre-shoot 3-cameras of film by the way-- before that, shows like the "Honeymooners" basically filmed live video off a tv set!) could be projected at a much higher resolution than you see on TV, in a theater.

        It would be possible to go back and grab the original film negatives of all these shows, scan them into a much higher resolution, and recut them to match the original-- you COULD have very high resolution "Mork & Mindys" and "I Dream of Jeannies" and the like. And that's without using any kind of video enhancement, just rescanning the film at a fuller resolution (you'd also get better colors and more levels of grey that come with film).

        Now I haven't read the full article (it's slashdotted to hell), but I wonder whether in many cases you WANT to increase resolution and take video shows up a notch like that.

        I mean, in many cases (sitcoms, etc.), when the cinematographer is shooting, he or she is thinking in terms of NTSC video, with all that implies. We may not be meant by the artists to see actor's zits, pancake makeup, cheesy props and sets, and other unwanted details that this could reveal. If we went back to film, I bet we'd start noticing a lot of out-of-focus shots and boom mics and stuff that were never visible on TV, but stick out like a sore thumb at higher resolutions.

        Not that I'm saying enhancing or rescanning TV shows is a bad thing, but like colorizing or recutting movies, panning & scanning, etc., you're drastically altering the experience from what the artist had intended.

        In most cases this is probably is a perfectly fine thing (especially if you identify what you've changed and not present "Citizen Kane" in full color without caveats), but I guess I'm saying that in some cases we may want to ask ourselves if enhancing a video program beyond the original medium is good idea.

        W
        • Well, NTSC video has 29.97 frames per second (w/ 2 alternating "fields" per frame). So when the camera is held steady, that's about 30 sample exposures of a particular angle.

          For full motion video, our brains do this kind of integration for us anyway through persistence of vision. For the techniques described on the site to be used the successive images would have to be partially overlapping, not fully overlapping. It is pointless to do some sort of isomorphic mapping when the frames are fully overlapping already.

          The reason I mention it is that I've written software to knit together separate photographs into a single panoramic picture. My approach was quite different, based on applying a lens distortion to the digital image to make all of the images map into a consistent sperical space, then mapping them back to an isomorphic image after they have been joined. The approach described here appears at first reading to involve rotating the images into a common plane and knit them together in that plane - quite an interesting approach, and one I wish I had thought of at the time I was working on my own problem.

          But taking that and saying 'yeah, now we can restore old movies' is just a bizarre misunderstanding of what the technique involves.

      • You're not serious right?

        Compare any frame of the video to both the previous frame and the subsequent frame. They are likely to be very similar. The parts that are the same can be oversampled to produce a better quality image.

        Of course, this wouldn't work very well for action scenes.

      • Actually, this would be possible.

        Not only can you stitch multiple shots into one panoramic, but you can get higher (effective) resolution in areas with more overlap.

        If you then take this high resolution image and composite it back into the video, you'd have a higher-quality video, provided you did the compositing properly.

        It'd actually be easier to do this way because the motions of one camera (and thus the perspective corrections needed) are easily modelled, compared to the angles of multiple cameras when you have subjects at multiple distances. (Easy for a landscape, not as easy for a shot of people walking in front of this landscape, in the foreground.)

        Moreover, this could be used to remove the grain from film without blurring. And grain in movies is one of the most annoying distractions.
    • Unless one has access to the original film reel, it is unlikely that any sort of improvements could be made....

      Film masters are treated with great respect for this very reason. Whole companies exist, like Technicolor and FotoKem, whose business revolves around storing original camera negatives. There's a FotoKem vault somewhere-- I don't know which one, exactly-- that holds the original camera negative for Gone With The Wind.

      Of course, negatives can deteriorate over time even under the best of circumstances, but that's a different issue. Film restoration is a fairly well understood process these days, and more and more films are being digitally restored and archived as digital masters rather than film masters.

      Even if the camera negative was lost, there would still be various prints-- interpositives, internegatives, and so on-- that could be used to reconstruct the film, albeit with some loss of quality. If, somehow, every film copy of a movie were to vanish, you could go back to the 1", 3/4", or D1 video master. The chance that you'd have to go all the way back to VHS videotape to restore a film is so close to zero as to be hardly worthy of consideration.
    • FWIW, I know that the color in old photographs are sometimes 'restored' by applying algorhithms that 'know' how color is most likely to fade. Perhaps this technique could also be applied to video.
    • ...Imagine how this technology could be used to compress simpsons / futurama episodes?

      Hear me out... Mpeg (and DivX) are good, but they're like jpg. If we had something that was good at determining the parts of a screen that don't really change enough, they could be used as a "background" sort of thing, and instead of mpeg or divx, we could encode it into something more along the lines of the old autodesk animator .flc files. This could bring a high-quality simpsons episode down to only a couple of meg + sound.
    • Juding from the description found in that article, I believe that it is possible to enhance old video to higher qualities.

      No. Wrong. Can't do. Unless you have a video of a static scene - but that is not likely, is it. Well, you could probably make it a little better, by unsing a lot of computing power.

      For this technique it is not the "moving pictures", it's the "many low-res images" (of a larger static scene) property of video that is the key. These are, simply said, patched together and overlayed to build a single larger high-res image. IOW, instead of a video you could also use a number of (digital) photos of the scene.

  • by Hertog ( 136401 ) on Thursday July 25, 2002 @01:58PM (#3952516)
    Can we be sure his head didn't explode?
  • ... I had downloaded all the pr0n I'd ever need a new format comes along!

  • What other apps could this be used for? Sure, it's fun now, but what could it do for humanity?

    Surgery Camera? there's already some out there, but they have very distorted views from the lens and displays

    Security cameras? They could make a picture easier to interpret

    Movies? They'd look a lot different, like a Fear and Loathing look. but it'd be cool!

    improved pr0n? w00h00!

    Other ideas? Reply here!

    • Wasn't there a recent story here about developing a multiple-mirrored telescope to allow high resoultion images of deep space? Some of the discussion even mentioned the notion of placing individual mirror elements in different places around the world to help improve resolution. Such a scope is harder to use than a single curved mirror (despite the cost savings) due to image distortion. I would think this kind of technology would be perfect for something like that...
      • Re:Applications? (Score:2, Interesting)

        by dpp ( 585742 )
        Wasn't there a recent story here about developing a multiple-mirrored telescope to allow high resoultion images of deep space? Some of the discussion even mentioned the notion of placing individual mirror elements in different places around the world to help improve resolution. Such a scope is harder to use than a single curved mirror (despite the cost savings) due to image distortion. I would think this kind of technology would be perfect for something like that...

        Not really, unfortunately. You're thinking of interferometry or aperture synthesis [cam.ac.uk], which can also be done with light [cam.ac.uk].

        This requires knowledge of the phase of the light rather than just its amplitude or power, which is all you get from normal video cameras. Also, interferometry increases your resolution but not your field of view, i.e. it's closest to the part of the article about zooming in, not panning around. To use the technique in the article you'd have to build bigger telescopes to get the improved resolution, which is what astronomers try to do anyway.

        If you're talking about combining lots of images from the same vantage point in order to improve your field of view, astronomers do this mosaicing all the time. For some of my work on the Galactic Centre [hawaii.edu] I was using an instrument with a small field of view (a thirtieth of a degree), and I had to pan the telescope as well as stitch multiple observations together to get the full map which was still only a few degrees across (the size of a few full moons).

    • Actually, the notion of being able to take video and use it to deduce higher resolution dovetails with an idea that I've had for many years now... specifically, stuff that's *purposely* made low-res. I'm referring to those video clips you see on TV where they obscure someone's face or a license plate with large square blocks that are, presumably, an average of all of the pixels "covered" by the square. The interesting thing is that, when the video pans or zooms even a little, the blocks all change their colors slightly... which divulges the subtle changes in the average pixel values. With a few seconds of video... and with a sufficient amount of jitter on the part of the cameraman, it should be possible to get a much higher-res picture and, essentially, defeat all of this nonsense about trying to "protect identities, blah, blah, blah..". :) Another less sinister application: PhotoHunt! At our local bars, we've got these little touch-screen video games mounted on the bar. They all play a variety of games (card games, memory games, etc.). One of these games is called "PhotoHunt", where the game shows you two images side-by-side. The images are identical except for about 5 differences (ie, the tree in the background might be missing some branches or something). Of course, the pictures always have very stunningly beautiful models in the foreground, to distract you from the task at hand. Anyway, I've long wanted to write a program that would let me walk into the bar with my laptop and webcam (needless to say, I'd be going home alone on *that* night), point it at the PhotoHunt game and have the laptop instantly tell me where the differences are. Seems like this guy's comparative image processing would be right up that alley.
      • Use your visual cortex!

        Simply combine the two images. Either cross or boss your eyes to superimpose the images.

        The differences will stick out like sore thumbs.

        You don't need extra silicon when you've got so much carbon dedicated to the task!
  • This can of course be done (compositing multiple images to create a large image) but the problem is that each lens appears to see a slightly different image (much like human vision with two eyes) and as such the stereo effect is present. You can create an image but it will not work perfectly for all cases. If the scene is far enough away the draw backs will be minimal but as objects get closer this will have effects. What would be more interesting would be to use the dual cameras to generate two video feeds that could be piped into a HMD (head mounted display) with two displays (one for each eye) and then the stereo effect would produce a 3D view for the camera source increasing realism. The larger image would let you see more just don't expect 4 640 x 480 images to create a seemless 1280x960, you will need some overlap and the 4 images will not be from the same perspective so will always look like 4 images pasted together.
    • I think the point of this technique is to correct for this unwanted stereo effect. Of course, I won't know 'till I read the article, and who does that?
    • To some degree you will have those problems, similar to when you try to paste 360` panorama images together. You should be able to so some sort of software compensation to help correct this, particularly where you have a complete image and the image being pasted is a higher res portion of the origional image.

      I seem to rememebr a program on television a few years back about HDTV. The program claimed that after the release of HDTV later models would be able to interpolate the background from TV shows on the fly to produce the 16:9 ratio from older shows filmed/taped in 4:3. - They showed how I Love Lucy could look in 16:9 - which was, very unimpressing. (It may have been my NTSC TV though).

  • Thanks Steve! (Score:2, Insightful)

    by Anonymous Coward
    First he whines about there being spy cameras everywhere (IEEE ISWC 2001 Zurich) and then he does work to make them more effective. What's the deal?
  • This new development of high-res composite images, along with the series of volcano eruptions that have been occurring in Japan, is clearly another sign that Linux will triumph over Microsoft, and who knows, maybe one day, even over Apple! That's not all; just as Spider-man is a pinacle of the American patriotic awakening against the forces of Axis of Evil risen out of the ashes of post-9/11, this development is a milestone that sets the end of Lucas Art's Star Wars empire, making the way for Lord of the Rings as a ray of light against Lucas' seemingly everlasting hold on nerd culture. Please, do mod me as troll/flame bait.
  • Look out Hollywood (Score:3, Interesting)

    by JojoLinkyBob ( 110971 ) <joeycato@gmail . c om> on Thursday July 25, 2002 @02:11PM (#3952583) Homepage
    A good testing ground for this concept could be boot-leg movie craze.

    All of the different recordings for a given movie are commensurably low-quality, but wouldn't it be great if you combine the best aspects of each (a "greater of goods") to generate one sharp quality movie. Testing it should be a little easier since you could use the rectangular silk-screen to calibrate the images. Food for thought.

  • Fourth Dimension. (Score:4, Interesting)

    by Fross ( 83754 ) on Thursday July 25, 2002 @02:11PM (#3952585)
    This in a very interesting and inspired use of technologies, that is giving some great results. However, one thing that is not bing taken into account here is that video is shot over time - subsequent frames of a scene represent a change in a scene according to how things progress over time. Thus for anything other than a static scene (which is not of too much use) this can cause problems.

    Take for instance the example on the main page of this (if it's not slashdotted already), the two swimmers standing ready to dive in. In a real-orld situation, by the time the first picture of th swimmer on the left was taken, the one on the right may have already dived in - when it comes to take that one's picture, he would be already swimming away. Hence if these images were composited, it would look like one dived in while the other was still on the blocks.

    Possibly of artistic interest, but otherwise a bit of an annoyance in what is definitely a very cool use of technology. It's interesting that after 100 years or so, we could be back at the point where someone says "hold still for a few seconds, i'm going to take a picture".

    Fross
    • Oddly enough, the two swimmers in the photograph you mention are the same person.

      This software is most useful for compositing pictures taken with a camera that is not specially designed to take panoramic pictures in one snap of the shutter. The software overcomes the limit of the photographic hardware.

    • This is a gateway to pingpong-ball-less motion capture. In future with sufficient processing power and algorithyms, it ought to be possible to combine two lenses spaced apart for stereo, combined with x,y,and z axis positioning sensors. Such a device could record stereo data, combined positional data and the understanding that objects "grow" as the come closer", to make 3D models of anything it sees. The more time it can watch an object and rotate/zoom around it, the more detailed the model can be. It doesn't even have to make the model in realtime, just record as much data as it can then upload it to more powerful computers later. When does Minority Report take place? 2050 or so? Well by then I fully expect that instead of the flat holograms Tom Cruise watched we'll have full 3D.
      • The techniques you talk about in such breathless terms have been in commercial use for several years. Discreet's compositing software has a 3D tracker module that can infer three-dimensional relationships from moving video; it works pretty well under most circumstances. And there's an outfit called RealVis, I think, that can turn a scene or a series of stills into a fully textured 3D model with only minimal human interaction. They used the same basic technique on The Matrix, way back in '98, to build virtual sets for some specific special effects shots.

        The only real limitations are contrast-- a computer couldn't isolate a polar bear in a snowstorm no matter how well lit and shot-- and field of view. If you don't shoot the back of the car, you can't see the back of the car. (I know that's kind of a ``duh,'' but you'd be surprised how many people don't get that at first.)
        • Do you think this will inevitably go mainstream or at least become a luxury purchase like home theater projection systems are now? Thanks for the info.
          • Maybe I wasn't clear. These are absolutely mainstream products now, and have been for several years. You can buy them on the open market any time you like.
            • Are there open source versions to look at and play with?
              • Are there open source versions to look at and play with?

                You're kidding... right? Of course there aren't any open source versions. This isn't some graduate student's research project we're talking about here. This is extremely expensive commercial software that very big companies use to make very big movies. Inferno goes for about $650,000, new. The computer it runs on is six feet tall and draws more current than the average household dryer. This is serious stuff.
                • Read the parent and grandparent posts to mine. I'm just trying to demonstrate that the software is hardly "mainstream".
                  • I think your definition of "mainstream" is flawed. The software is commercially available. You can buy it right now. I have a system set up in the office down the hall from mine, as a matter of fact. There are hundreds of Inferno systems around the world being used for motion picture and television production. Hell, the next season of Enterprise is being produced in HD, and (starting sometime in the next couple of weeks or so) effects will be created with Infernos at a post production house on Sunset Boulevard.

                    These products are absolutely, 100% mainstream. The fact that they're expensive doesn't make them not mainstream. The fact that there's no open-source-gimme-gimme version certainly doesn't make them not mainstream.
    • There are some products that do this already. They're being marketed first as motion capture devices. Motion capture usually requires complex setups [gamasutra.com], but there are some products out that just let you just spin a video camera around someone's head, and generate a 3D model with the textures generated of the person's skin, eyes, etc. I can't find a link to the exact companies that are doing this, but I had a roommate in CG that came back from a conference really excited with a disk that had a 3D model of his head.
    • Carnegie Mellon University does something like this for three dimensions. Remember the Super Bowl?

      You just produce 3d models and then produce 3d vector data.
  • by mike3411 ( 558976 ) on Thursday July 25, 2002 @02:12PM (#3952589) Homepage
    The site's very /.'ed, but I believe what's done is similar to a technology used by security firms and the military. Essentially, when you take a picture of a given object/scene, the "true" resolution (comprised of each individual photon bouncing off the objects and striking the lense) is always downsampled, to varying degrees, depending on the resolution of the camera. However, if a camera is moving, while each individual frame will be of equal resolution, the particular data that each is storing will contain differnt information about the object/scene. If, for example, the camera is pointed at a grayscale gradient that's so small it only occupies one pixel, that pixel might appear white, black, or somewhere in between depending on the exact orientation of the camera, and in a regular video would probably look like some indistinct blur between these colors. With analysis, the changes can be examined and used to create an image that accurately portrays the gradient.

    Traditionally, this has only been done with motionless cameras, it sounds like what this professor has done is to extend these capabilities to moving and zooming video, which is extremely cool (and I really want to check out his site, so everyone else stop going there :).
  • Wow. You've proven you read memepool. Congratulations.

  • Any bets on how long the government has had this technology?

    I think it's a fantastic proof-of-concept, and I'm also glad it is open source simply because it is so very useful. Ever watch COPS on Fox, or America's Most Wanted? Say goodbye to those grainy security camera images. I don't see why this couldn't be applied _overnight_ at every precinct in the country.

    • First: this sort of thing has been available via commercial software for quite some time. This [d-store.com] is just one example.

      Second: Photo stitching doesn't work very well when the object is moving. To do this successfully, you have to break up a 2D image into its 3D components and track the faces of each object and stitch each face together. Obviously, this is much harder (though I think there is some commercial software available that does this to, so the theoretical underpinnings must be complete).

  • A little FAQ (Score:2, Informative)

    by fractalk ( 564689 )
    Grabbed this a little before the server colapsed

    The four main programs you need to use to assemble such image sets are estpchirp2m, pintegrate, pchirp2nocrop, and cement (Computer Enhanced Multiple Exposure Numerical Technique).

    The programs use the ``isatty'' feature of the C programming language to provide documentation which is accessed by running them with no command line arguments (e.g. from a TTY) to get a help screen. The sections for each program give usage hints where appropriate. Future versions will support the ``pipe'' construct (e.g. some programs may be used without command line arguments but will still do the right thing in this case rather than just printing a help message).

    The first program you need to run is estpchirp2m, which estimates coordinate transformation parameters between pairs of images. These ``chirp'' parameters are sets of eight real-valued quantities which indicate a projective (i.e., affine plus chirp) coordinate transformation on an image.

    The images are generally numbered sequentially, for example, v000.ppm, v001.ppm, ... v116.ppm (e.g. for an image sequence with 117 pictures in it).

    After you run estpchirp2m on all successive pairs of input images in the sequence, the result will be a set of sets of eight numbers, in ASCII text, one set of numbers per row of text (the numbers separated by white space). The number of lines in the output ASCII text file will be one less than the total number of frames in the input image sequence. For example, if you have a 117-frame sequence (e.g. image files numbered v000.ppm to v116.ppm), there will be 116 lines of ASCII text in the output file from estpchirp2m.

    The first row of the text file (e.g. the first set of numbers) indicates the coordinate transformation between frame 0 and frame 1; the second row, the coordinate transformation between frame 1 and frame 2, \ldots A typical filename for these parameters is ``parameters\_pairwise.txt''

    These pairwise {\em relative} parameter estimates are then to be converted into ``integrated'' {\em absolute} coordinate transformation parameters (e.g. coordinate transformations with respect to some selected `reference frame'). This conversion is done by a program called pintegrate.

    This program takes as input the filename of the file containing parameters from the ASCII text file produced by estpchirp2m (e.g. ``parameters\_pairwise.txt'' and a `reference frame' (specified by the user), and calculates the coordinate transformation parameters from each frame in the image sequence to this specified `reference frame'.

    The output of pintegrate is another ASCII text file which lists the set of chirp parameters (again, 8 numbers per chirp parameter, each set of 8 numbers in ASCII, on a new row of text), this time one parameter per frame, designed to be used in order. That is, the first row of the output text file (first set of 8 real numbers) provides the coordinate transformation from frame 0 to the reference frame, the second from frame 1 to the reference frame\ldots

    The program called pchirp2nocrop takes the ppm or pgm image for each input frame, together with the chirp parameter for this frame % from pintegrate, and `dechirps' it (applies the coordinate transformation to bring it into the same coordinates as the reference frame). Generally the parameters passed to pchirp2nocrop are those which come from pintegrate (e.g. {\em absolute} parameters, not relative parameters). The output of pchirp2nocrop is another ppm or pgm file.

    The program called cement (CEMENT is an acronym for Computer Enhanced Multiple Exposure Numerical Technique.) assembles the dechirped images (which have been processed by pchirp2nocrop) and assembles them onto one large image `canvas'.

  • Next up: A 360 panoramic view of the server room exploding :)

  • From their paper on the topic:

    It is argued that, hidden within the flow of signals from typical cameras, through image processing, to display media, is a homomorphic filter. While homomorphic filtering is often desirable, there are some occasions where it is not. Thus cancellation of this implicit homomorphic filter is proposed, through the introduction of an anti--homomorphic filter. This concept gives rise to the principle of quantigraphic image processing, wherein it is argued that most cameras can be modelled as an array of idealized light meters each linearly responsive to a semi--monotonic function of the quantity of light received, integrated over a fixed spectral response profile. This quantity is neither radiometric nor photometric, but, rather, depends only on the spectral response of the sensor elements in the camera.

  • Um, Old news? (Score:2, Interesting)

    by Salamanders ( 323277 )
    He has had this software out for a while, I've tried to play with it. NOT easy stuff to pick up and figure out the guts, the source code wasn't meant for your average curious person with coding skills. (Non-OO C code, not that many comments.)

    To tell the truth, I'm amazed this hasn't been snapped up by some of the digital camera manufacturers. I know Canon already has a panoramic "helper" that shows part of your last image so you can position the next one.... imagine if it had a built in "Hold down the button and wave your camera around a bit to take a wild angle pic"
  • by mikeee ( 137160 ) on Thursday July 25, 2002 @02:26PM (#3952674)
    It would be really neat if it could interlace multiple video streams into a higher-resolution single stream.

    Use of such a technique to defeat no-copy flags left as an exercise.

    I saw an article a few weeks ago about some DoD fooling about with tech that merged multiple cameras (at fixed locations) into a 3-D model that could be viewed from different angles in realtime. Anybody have a link to that one?
    • I thought a useful application of this would be an improved rear-view mirror for a car. Right now, people look at 3 different mirrors (not even placed near each other) and have to form a 3d image of whats behind them from those sources.

      Here's my idea:
      A single wide-screen video monitor that shows a composite video image from a number of cameras around the car. With cameras now small and cheap, you could place two prodtruding from the front of the car, looking back at the driver and the adjacent lanes (some school busses do that with big mirrors, but that wouldn't sell on a passenger car. Plus, the image is too small). Another camera could be behind the rear windshield, giving an unobstructed view out the back.

      The composite view would show a view of the front of the car, but the passenger compartment would either be transparent, represented with a wire frame, or be semi-transparent (so the driver could see himself/herself and get a sense of what side of the picture is what). The height of the view could be varied, too -- it could be a sythesized view from about 10 feet above the hood. These are views that would be either impossible to achieve normally (i.e. transparency) or impossible to achieve without a big protruding boom, and would provide a complete merged picture of a number of cameras.

      p.s. car manufacturers- I'll sell you a license for my technology, if you want.
    • I know for a fact that Carnegie Mellon is a University working on this. The head of their robotics/haptic section came by the University of Utah about a year ago, and was showing video of this
  • by t0qer ( 230538 ) on Thursday July 25, 2002 @02:32PM (#3952712) Homepage Journal
    The snappy video snapshot from play inc did this years ago IIRC. Even though NTSC res is 720x480 the snappy was able to squeeze high res pictures out by sampling 2 frames, them performed mathmatical magic to achieve resolutions over 1280x1024.
  • I saw a special on the Discovery channel where a bank robbrey in Britian was foiled because police were able to clean up the grainy, blurry surveilance camera footage using a similar technique.

    I wonder how this could be used for stenography...
  • More examples (Score:4, Informative)

    by interiot ( 50685 ) on Thursday July 25, 2002 @02:33PM (#3952722) Homepage
    Some more pictures from Video Orbits:

    Ready the Slashdoting!

  • Right here [google.com]
  • what if you could make a video recording device that acted like snapshot camera, where the the lens captures images in a fast circumference sweep. I saw JPL in pasadena had those ultra-fast video recording devices about 15-20 years ago where they could film a balloon popping and it recorded at like 200 frames per second. Sure we have the technology available at a more reasonable price now. What if you combined a fast frame recording technology, either recording in a horizontal scan or a circular sweep like the hands on a clock, at 200 fps. (or 360 fps?) I wonder what kind of resolution you could get from something like that?
    • Some cameras used for research work (especially in the field of explosives) can go up to (possibly past) 10,000 FPS.

      This is film, mind you, not digital, but the image correlation we're discussing isn't realtime anyway - might as well add the step of doing a bulk scan on the film to the equation for the extra FPS.

  • by Anonymous Coward
    of marketing from that annoying airbag.

    First, he puts a small radio under his skin and he calls himself a 'cyborg', now he takes what NASA did decades ago with probe pictures and calls it his own?

    This guy needs to be removed from universities. He is contributing NOTHING.
  • by Astin ( 177479 ) on Thursday July 25, 2002 @02:40PM (#3952770)
    My undergraduate design project was with Steve Mann on this technology (objective was the "parallelization" of the software on a Beowulf cluster - shout out to Mike and Anna :) ).

    The main use of this system so far has been to stitch multiple images into one panoramic shot. Like any auto-stitching program, this requires a certain amount of overlap between frames - the more overlap, the better the stitching. The code works remarkably well, automatically rotating, zooming, skewing and otherwise transforming the images to fit together and then mapping them into a "flat" image as opposed to a parallelogram-shaped one.

    Yes, the higher resolution from multiple shots of the same scene works, and is a very cool effect of the system. Of course, this requires a more or less static scene.

    Finally, it's not necessarily "video" that it uses, although pulling individual frames from a video would work. It's based of the head-mounted cameras of the wearcam systems, which essentially use a stripped-down webcam for image-gathering, so you already know the fps and resolution limitations involved with those.

    Of course, in the 2 years since I've been there, the technology has probably improved, although I doubt the webpage has. :)

    Mann has a bunch of cool projects involved with the wearcam/wearcomps. This is a great one, another is the Photoquantigraphic Lightspace Rendering (painting with light), which can also be found on the wearcam [wearcam.org] site.

    • Addendum (Score:3, Interesting)

      by Astin ( 177479 )
      One more thing - this isn't done in real-time. It can be run on a single machine and take a fair bit of time as it works through image pairs. Therefore, the more images you use, the longer it takes.

      ie.- 5 images: 1, 2, 3, 4, 5

      compares 1 & 2, 2 & 3, 3 & 4, and 4 & 5. The co-ordinate transformations for each pair are relative to the base image (so you don't have to re-transform after stitching).

      There has been work to farm out the comparisons across a Beowulf cluster (the one built when I was there, was of some impressive VA Linux boxes, I believe it's been expanded since). But this still takes some time. So unless someone's going to get a parallel computing cluster inside a single package and make it affordable, this won't be rolled-out nationwide overnight.
    • Pretty extravagant name for dynamic range recovery....
      If you read the literature, there are other techniques that actually are more sound and do a better job (Debevec's work is at least visually impressive). Most importantly they don't obfuscate the technique by introducing weird ass names and terminolgy to be special.
      • Yup, but some of that is Mann's sense of humour. He often comes up with an incredibly complicated name that he'll put on papers and displays, but when he talks about it never uses those terms. It's actually pretty funny to watch him come up with something on the fly sometimes to describe an otherwise simple process.

        Also, I think some of the work from his process is impressive itself. The images he has on the page (mostly the University of Toronto's Convocation Hall I believe) were some of the originals. I've been out on a few late night "paintings" with him and we've gotten some pretty impressive shots. It's all a matter of how much time and effort you put into building the image.

  • by Jah-Wren Ryel ( 80510 ) on Thursday July 25, 2002 @03:00PM (#3952888)
    You know how televisions shows will pixelate the face of someone that doesn't want to be show on television? Sometimes it is just a passerby on MTV's Realworld who won't sign a release, but sometimes its somebody a little more important like a corporate or federal whistle-blower.

    I've long thought that pixelization wasn't a very good way to protect the identities of these people because when they are on video, they move around and the camera sometimes moves around, but often the pixelization is applied in post-production so it stays in a relatively constant location rather than tracking the features on the person's face. Anyone sufficiently motivated and sufficiently equipped with the right tools ought to be able to reconstruct a much higher resolution, non-pixelated image of the secret person's face by extracting all of the useful information from each frame and then corollating it all together with the general movements of the person in the frame.

    It sounds to me like pencigraphy is exactly the kind of science required to do something like that. So now the question is, who do we want to unmask? Too bad Deep Throat never made an on camera appearance.
    • It wasn't anyway if the pixels chosen were too small. You can often see this for yourself by blinking rapidly while watching a pixelated face. You be able to see added detail - often enough to recognize a face. I notice that more recent use of pixelation uses fall fewer pixels which make this technique ineffective.
  • think of this. several small video cams mounted in a had or headband or anything that the person wears around their person, all the images stitched together to forma mosaic panorama with no distortion, the image itself projected and visible inside the glasses that you are wearing.... Those secret service guys in dark shades would be able to view 360 degrees simultaneously, with the true front of them being the center ofthe display.

    True this is not possible now, a wearable computer would never have the power to do this real time, but Moore's law, you know, could happen in a few years.

    On the scarier side......get some sort of combat suit, that enhances the wearers strength / speed / endurance and provides additional armor and firepower, add this capability and the wearer can suddenly move faster, longer see in all directions simultaneously and target enemies....

  • There are two main applications of this idea

    Using it to stich mosaics together.

    Using it to use overlapping images to increase the effective resolution of the camera. This is called "super resolution".

    Computer vision types have been doing this for years. Shmuel Peleg of Hebrew University has done some good work and had the work show up in commerical products, including Videobrush - you could take a webcam, wave it around, and in real time get a mosaic. In 1995 (I think.) Don't know if you can still get that product.

    Do a google search for him and you'll find his home page and superresolution papers (Peleg and Irani is an accessible paper and one of the first - the concept they used, however, comes from Bruce Lucas' thesis at CMU.

    Applications include: combining NASA satellite images of the Earth to get higher resolution, ditto for images of the human retina; and, a personal favorite, smoothing images of the system used at the Superbowl a couple of years ago where they had 75 cameras and could show any play from any angle in liive video. That was done by Takeo Kanade, Luca's' advisor.

  • This was posted on memepool yesterday [memepool.com] with an interesting link [unh.edu]. This seems to be happening more and more.
  • They did something similar some years ago with the mars pathfinder mission. By combining all the images from the stereo-imager and the rover they were able to glue everything together into a textured 3d model [schwehr.org].
  • So, I got the source from sourceforge, and compiled it. The README talks about:
    The programs will also work with color (ppm format) images. The included shell scripts adapt the programs for use with jpeg (.jpg) files. The http://wearcam.org/lieorbits and http://wearcomp.org/lieorbits directories contain movie and image files for use with these programs.

    But I can't find any shell scripts anywhere.

    I have a "panorama" series of jpgs that I'd like to stitch together with this package (I already did it by hand in Photoshop, but automated would be sweet.)

    • I haven't looked... but I'm interested too.

      I bought my Olympus camera 2 years back and it came with "QuickStitch". Ever since the publisher stopped releasing patches and no longer sells their Pro version I can't get it to work in Win2k, OSX, and Wine hasn't worked yet.

      I'll be glad to be able to take panoramic pictures again.
    • I downloaded this stuff about a year ago thinking it would be cool to build a GIMP plugin on top of it to make the whole process a little simpler.

      However when I downloaded the tarball it already included a plugin contributed by someone else. This was in one of the 1.x releases directly off Steve's site, not from sourceforge. I just did a quick google for 'video orbits gimp plugin' and nothing leaps out.

      I don't think I have the older software - I switched machines since and dumped a lot of stuff - but I'll dig around this afternoon. Anyone remember this plugin and know where the hordes can find it?
  • Well, not exactly, but something using the same principle to effectively antialias and dispeckle your pictures. It only works with a tripod and a static scene.

    First, take a few identical pictures of the same scene.

    Then, superimpose them in your favorite photo editor.

    e.g. if you take 5 pictures, you can decrease the brightness each to 20%, then add them together, or take a fractal sum average, say the first picture contributes 50%, the second 25%, the third 12.5% etc.

    The results are usually very impressive, especially for older cameras.
  • Quite similar to Image Mosaics, a project we did in the Image and Video Processing [bu.edu] class with Prof. Sclaroff. [bu.edu] Here's my take on the project (inluding the source code), with a pretty good explanation of how to do this: Go here... [mit.edu]
  • by pla ( 258480 ) on Thursday July 25, 2002 @08:02PM (#3955226) Journal
    Can we say "documentation", people?

    I have three pictures, with roughly 2/3rds overlap.

    I ran them pairwise (1 and 2, then 2 and 3) through estpchirp2m. Good, I get two output sets of 8 reals. I stuff them into a single file, one on each line.

    So I pintegrate that file, using picture #2 as the reference frame. Cool, I now have three sets of eight reals.

    Next, I pchirp2nocrop all three separately, passing the appropriate line from pintegrate on the command line (why bother with text files here, if I need to cut-and-past at this step anyway?). I now have three new .pbm files, which seems like what I should have according to the extremely limited documentation.

    Step four, I cement the three new .pbm's together, and get a single file as the output. "Great!", I think, it worked and didn't give me too many problems.

    So I open up the picture. Or try to. It seems that whatever the output file has in it, valid .pbm data doesn't top that list.

    I tried again, but since I had followed the (limited) directions carefully the first time, my results did not differ.


    So, I have three suggestions for Mr. Cyborg...

    First, it doesn't matter *how* cool of a program you write, if no one can figure out how to use it (WRITE SOME REAL DOCS!!!).

    Second, it doesn't matter how cool your program *sounds*, if it doesn't work.

    Third, 99% of people playing with this will either not want to tweak any of the in-between stages' results. Of those that *do*, 99% will just hack the source. Ditch the four (and then some) programs, and make a single executable that takes as its arguments just the name of the input files, in order, and perhaps a *few* tweaking options (like enable or disable filtering, which sounds useful, except YOU DON'T HAVE IT DESCRIBED ANYWHERE!).


    Ahem.

    Otherwise, great program. No doubt one of the many companies doing the same thing for the past 20 years will soon have their lawyers send their congrats.
  • Or similar, Salient Stills [salientstills.com] has a similar product.

    I'm glad to see these products because I proposed doing this in a graduate seminar in the early nineties (was CS undergrad at the time) and the PhD candidate leading the class went on about how it was mathematically impossible (and by extension how dumb I was because I didn't understand that particular math). Righto, Charles.
  • When I was a senior in high school I attended a "science and engineering" conference for college-bound seniors. The main presenter at the conference was a researcher at the NASA JPL and Caltech.

    He used earth-based telescopes to take pictures of asteroids. The problem is that the pictures were very blurry. They were almost unusuable.

    To solve the problem, they took ten or fifteen pictures, each from a slightly different angle. The pictures were scanned into a computer, and then a software program would analyze the pictures, producing one much sharper picture. The results were incredible. Of course, that was the point: impress students enough to make them want to be engineers. :)

    --Bruce

    (My memory of this is a little fuzzy, so a few details might be off.)

I've noticed several design suggestions in your code.

Working...