Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
Media Technology

Making 3D Models from Video Clips 103

Posted by ScuttleMonkey
from the fun-toys dept.
BoingBoing is covering an interesting piece of software called VideoTrace that allows you to easily create 3D models from the images in video clips. "The user interacts with VideoTrace by tracing the shape of the object to be modeled over one or more frames of the video. By interpreting the sketch drawn by the user in light of 3D information obtained from computer vision techniques, a small number of simple 2D interactions can be used to generate a realistic 3D model."
This discussion has been archived. No new comments can be posted.

Making 3D Models from Video Clips

Comments Filter:
  • Terrible link (Score:5, Informative)

    by masterz (143854) on Monday January 07, 2008 @05:37PM (#21947632)
    wow, what a terrible link.

    A quick search turns up the project homepage http://www.acvt.com.au/research/videotrace/ [acvt.com.au]
    • Youtube (Score:5, Informative)

      by Anonymous Coward on Monday January 07, 2008 @06:30PM (#21948124)
    • by GroeFaZ (850443)
      A quick look and less desire for first post would have revealed the very same link at the end of the BB post.
      • Re:Terrible link (Score:5, Insightful)

        by apankrat (314147) on Monday January 07, 2008 @06:40PM (#21948208) Homepage
        Outside of /. this sort of news "wrapper" articles (BB or not) is considered a blog spam. There is absolutely no reason to link to a wrapper, when it just rehashes what's in the original article and then forwards to it for details (which is what a vast majority of readers would want anyways).
        • Sometimes the editors just wish to give credits to who discovers it, and so the wrapped link.
          I think both link should be provided, the direct and the wrapped...
        • Re: (Score:3, Funny)

          by wdebruij (239038)
          which is what a vast majority of readers would want

          Are we on the same site? What is this "article" you talk of?
    • ...I can make a perfectly accurate 3-D character model by just feeding the program a bit of video and pointing out the character. Then, all we need is the same with voice and I can make my own animes! Man, that would be sweet, but I think we're still a ways off from that.
      • by Unoti (731964)
        We're a heck of a lot closer with this than without it. This is a huge step in that direction. There's already quite a bit of technology out there to convert bitmaps to line drawings, and things to track the same object in a video. We'll wake you up later if you insist, but I expect a lot of hardcore developers are waking up now and getting started on some badass research projects.
    • It surely mitigates the slashdot effect.
  • by CrazyJim1 (809850) on Monday January 07, 2008 @05:40PM (#21947654) Journal
    AI needs a way of interpreting video input into 3d objects and environment. Once a computer can represent objects in a 3d environment, it can then perform operations on them. Technically you could make AI without this tool, but you'd have to do extremely precise and patient CAD inputs that would take most of your life. With a tool to convert video into 3d objects, you can just start cataloging all the objects out there. Add in a 3d physics simulator, and you're halfway to true AI. I have a quick overview on how to do AI, and as you'll note on the very beginning of the page [geocities.com]: the reason I haven't worked on AI myself is that I can't code a video->3d object converter myself.
    • by QuantumG (50515) <qg@biodome.org> on Monday January 07, 2008 @05:46PM (#21947718) Homepage Journal
      Have you heard of the Scale Invariant Feature Transform [wikipedia.org]? Well you have now. There are libraries written in C# (no less) which are publicly available to do this stuff. You can recognize a large collection of objects.

      • by kudokatz (1110689) on Monday January 07, 2008 @06:23PM (#21948054)
        SIFT is ok even for occluded objects, but is horrid in 3-d because SIFT features cannot match up for a significantly rotated scene. There are better algorithms that can recover both the shape of the scene as in the article and even produce the location of the camera as a by-product.

        In terms of object recognition, there has been great work done by treating an "nxn" pixel image as a point in n^2 space, and then reducing the computation space and projecting a given image onto that new, lower-dimensional approximation of the original object, and finding a match via a nearest-neighbor search through recognized objects.

        There is also good work being done in terms of getting a detailed 3-d model using structured light methods: http://www.prip.tuwien.ac.at/research/research-areas/3d-vision/structured-light [tuwien.ac.at]

        There is good literature out there, but sometimes the math gets over my head =P
      • There are libraries written in C# (no less)


        People should really give up on that and start using D :)
        • There are libraries written in C# (no less)


          People should really give up on that and start using D :)
          I'm more of a Blues Programmer and have tuned my compiler to A minor.
    • "True AI"? (Score:2, Insightful)

      by Anonymous Coward

      Add in a 3d physics simulator, and you're halfway to true AI.

      I've never heard of "true AI" -- do you mean strong AI [wikipedia.org]?

      And no, computer vision plus physics simulation does not make half of strong AI, either. Russell and Norvig, the classic AI text, lists 9 abilities generally required for strong AI. 2 is not half of 9.

      I have a quick overview on how to do AI, and as you'll note on the very beginning of the page [geocities.com]: the reason I haven't worked on AI myself is that I can't code a video->3d objec

    • by 4D6963 (933028)

      Add in a 3d physics simulator, and you're halfway to true AI.

      Excuse me but what exactly do you mean by "true AI"?? It'd better not be "strong AI" cause if it is I want some of whatever it is you're smoking.

  • Hasn't this been a mainstay of movies forever?
    • by timthorn (690924)
      Since 2001 at least. Boujou from 2d3 Ltd.
      • by aseidl (656884)
        Or try the free (as in beer) Voodoo Camera Tracker [uni-hannover.de]
        Lets you export pointclouds (not 3d models, as in the story) to a variety of formats, including Blender.
        • Voodoo is pretty great, although its automatic feature point estimation in 3D is a bit limited. For features far away, in Free Move mode, the stereo math breaks down and it only gets direction correct, while distance from the camera can be anything from -infinity to +infinity.

          An option for creating a textured 3D VRML model from images is PTStereo [panotools.org]. It is designed to work with 2 or more images that are taken far apart from each other, not a video sequence. Hugin, and SIFT can create the control points,

  • by bn0p (656911) on Monday January 07, 2008 @05:43PM (#21947694)
    Software like Canoma from the now-defunct Metacreations would let you create 3D models from 2D images in the mid-to-late 90s. I also remember reading about people using Viz ImageModeler [realviz.com] to convert images from video to models even though the software is also designed for still images - the users would just capture those frames they needed to create the 3D model.

    The only thing "new" about this is using video as the input without having to grab the individual frames yourself.


    Never let reality temper imagination
    • Re: (Score:3, Interesting)

      by Anonymous Coward
      Actually, algorithmically, you can make a substantial leap in processing capabilities when you switch from feeding in series of still images to video. This may seem a bit counterintuitive, since a video is just a series of still images, but the key is that a video is a continuous series of still images.

      The main problem with existing techniques is that they often require a lot of user interaction to create a complete model, because points between images have to be delineated and correlated by hand, or at be
    • by samkass (174571) on Monday January 07, 2008 @06:03PM (#21947868) Homepage Journal
      Yeah, the big breakthrough in this, IMHO, was a 1994 paper by Takeo Kanade of CMU's Robotics Institute titled "A Sequential Factorization Method for Recovering Shape and Motion from Image Streams [cmu.edu]", which did a pretty good job of factorizing out the 3D model as well as the camera motion from a video stream... it could tell you not only the dimensions of the house you were videotaping, but the stride of the person holding the camera. This laid the groundwork for a lot of other "model from video" work done throughout the 90's. More recently a group there has done a lot of work on "Shape from Sillouette [cmu.edu]" which looks closer to the technology that this product uses.

      I've been waiting for this technology to go big on eBay for a decade... maybe this'll be the year.
    • by ashooner (834246)
      I think this [unc.edu] is much more impressive. Tracing isn't needed if the location of the camera can be determined. Pretty cool stuff.
  • by jollyreaper (513215) on Monday January 07, 2008 @05:52PM (#21947772)
    Remember back in the day when we were told that computers would never be able to learn how to understand human speech because it's too complicated? The arguments were compelling but now we've got voice recognition working over crappy telephone connections and dictation software is getting better all the time. As bad as the voice recognition problem was, computer vision seemed like an even harder nut to crack given how impossible it seemed to get a machine to go from a two-dimensional image to 3D. All of this stuff seems like impossibly difficult "we'll never get there" AI impossibilities and then we see a technology demonstration that nails it. I'm still astounded that DARAPA is not only asking for robot-driven cars, they're actually getting teams producing working results. That's another problem I always thought would be impossible.

    My prediction for the future: the 21st century will be for robotics what the 20th was for aviation. We've been thinking about it for centuries but now the technology is maturing to the point that we can really do something with it. The stuff we're amazed by today is going to seem like wood and canvas biplanes.
    • by Atario (673917)

      I'm still astounded that DARAPA is not only asking for robot-driven cars, they're actually getting teams producing working results. That's another problem I always thought would be impossible.
      I knew it was doable (even if only with assistance by way of special roads), but no one was putting any real effort into making it usable en masse. So thank you, DARPA.
    • by MobileTatsu-NJG (946591) on Monday January 07, 2008 @06:34PM (#21948170)

      Remember back in the day when we were told that computers would never be able to learn how to understand human speech because it's too complicated? The arguments were compelling but now we've got voice recognition working over crappy telephone connections and dictation software is getting better all the time. As bad as the voice recognition problem was, computer vision seemed like an even harder nut to crack given how impossible it seemed to get a machine to go from a two-dimensional image to 3D. All of this stuff seems like impossibly difficult "we'll never get there" AI impossibilities and then we see a technology demonstration that nails it. I'm still astounded that DARAPA is not only asking for robot-driven cars, they're actually getting teams producing working results. That's another problem I always thought would be impossible.
      Hmm. Though it's not really that clear from your post, I'm concerned that you're seeing one problem where really there is two. In the case of voice recognition, getting a computer to recognize a spoken word within a certain context is far easier than getting the computer to understand a phrase like "Set up an appointment for me on the Fifth of May at 2 pm.". One is simple signal analysis, the other is context-sensitive understanding. The former is easy and has been possible for years. The latter is virtually impossible without the computer in question having 'experience'.

      The same is true for image recognition. You can get a computer to recognize movement pretty easily. Heck, the ability for software to detect the 3d form of an object has been around for ages. However, getting a computer to watch Star Wars and say "I see Dennis Lawson sitting inside an X-Wing fighter." is, as I said before, difficult to do without a concept of 'experience'.

      We'll get there one of these days, but right now the sorts of cool-sounding advancements we've been seeing really only work in very specific circumstances.
      • Hmm. Though it's not really that clear from your post, I'm concerned that you're seeing one problem where really there is two. In the case of voice recognition, getting a computer to recognize a spoken word within a certain context is far easier than getting the computer to understand a phrase like "Set up an appointment for me on the Fifth of May at 2 pm.". One is simple signal analysis, the other is context-sensitive understanding. The former is easy and has been possible for years. The latter is virtually impossible without the computer in question having 'experience'.

        I am aware, that's why I made the distinction between voice recognition on the telephone by automated attendants and dictation software. It's not quite perfect yet but it's a lot better than it used to be. We're moving from stuff being years off to in the next model year or two. I find that impressive.

        • It depends what you mean by "understand". What current dictation programs do is really just pattern-matching. It analyzes each word, and finds a word in its database that fits. It's the same system as the automated phone stuff, just on a larger scale. What the grandparent was talking about is getting a computer to comprehend speech - it's known as the natural language problem, and it's nowhere near solved.

          Even if a computer can pick the words out of your speech, it has no idea what they mean, unless such
    • by PingXao (153057)
      I'm still waiting for computers to be able to recognize speech, untrained and with a speaker-independent vocabulary range greater than a hundred or so recognizable patterns. One that can take dictation, get the grammar and punctuation right, capitalize words properly and distinguish between "right", "write" and "rite" (among others) depending on the context in which they're used.

      You say this already exists? On what planet?
    • by WK2 (1072560)

      Remember back in the day when we were told that computers would never be able to learn how to understand human speech because it's too complicated?

      I remember being told a lot of things. Like there is no moon. Only a small percentage of people would say that a technological advance would never happen. Never is a long time. As a previous poster pointed out, this particular advance hasn't happened yet, but it probably will eventually.

      now we've got voice recognition working over crappy telephone connections

      That depends on how you define "working." I would not qualify yelling into a phone slowly, and repeating yourself over and over as working. It is sad that so many places have replaced the old "press 1 to do x", which wa

    • by HTH NE1 (675604)

      Remember back in the day when we were told that computers would never be able to learn how to understand human speech because it's too complicated? The arguments were compelling but now we've got voice recognition working over crappy telephone connections and dictation software is getting better all the time.
      "Dear Aunt, so let's set double the killer delete select all"

      Recognition != Understanding
  • Oh yeah ? (Score:3, Funny)

    by witte (681163) on Monday January 07, 2008 @06:03PM (#21947866)
    I'd like to see how it holds up against Calista Flockhart footage and not go Division By Zero.
  • by markds75 (958873) on Monday January 07, 2008 @06:14PM (#21947962)
    I'm a Ph.D. student at UC Santa Cruz. I finished my masters a few years ago working on enhancements to a project [ucsc.edu] with similar goals. My advisor, Jane Wilhelms [ucsc.edu] (who unfortunately died shortly after I finished my masters) was working on computer vision techniques for several years. Her work focused on extracting motion for animals (often children or horses) out of videos. My Masters contribution was to look at how the accuracy and usability of the software could be improved if we assume that the general motion of a walk is the same for all instances of a particular species (the knees all bend the same way, and the legs move in the same order, etc). I didn't have a high quality capture to start with, so the results were a bit fuzzy in terms of accuracy, but it did make the process easier for the user. The user had only to make the "original" motion match the video at key frames (maybe 4 per "walk cycle"), and the computer could easily interpret the rest; I don't recall off the top of my head, but I think the number of key frames the user had to specify was reduced by half or more over the former process (without the canonical motion as a starting point). I didn't publish any papers based on my work, but my masters thesis (with example filmstrips) is available [ucsc.edu].
    • by Sleepy (4551)
      About 8 years ago I worked on the SynaPix project which was also very similar to the article. The SynaFlex system could recover motion paths, direction, camera path and scene geometry automatically from video. Or you could focus on one aspect/stage of this and (such as manually tracking points to trace an object or inserting perfect geometry to match a known object such as a floor plane, thus buttressing the geometry and sometimes tracking results).

      Pretty neat stuff. A pity the company outspent itself (too
  • Test case (Score:3, Interesting)

    by kramulous (977841) * on Monday January 07, 2008 @06:22PM (#21948040)
    Hook up google maps api with polar navigated flight path, some edge/point detection algorithms and start mapping. That'd be an interesting video.
  • I work as a video professional for a very large stock video [thoughtequity.com] provider. I could see software like this being an amazing tool for a company such as mine. Not only can we offer you footage of (for example) a horse running through a field - we might be able to sell the elements themselves or in addition to that? Need some more horses? How about we just sell you the background and you pick what animals you want? A lot of the time the video industry is dictated by extremely tight deadlines and budgets - any t
    • by dimeglio (456244)
      How about for helping CSI folks reconstruct CCTV footage of a crime.

      Maybe even for UFO researchers get more details on so called video footage of a "real UFO."

      Not sure if additional information can be extrapolated from the technique (I didn't read TFA), but it can potentially be in fact very helpful.

  • Apply that to the 2d sprites in doom ? I like the new engines out there created to play doom II wads and new fancy poligonated objects , but it would be nicer if the monsters were 3d as well.
  • In my thesis I'm also creating a 3d model from a video stream, only I'm using stereoscopy and pattern recognition to find matching objects in each frame and triangulating the depth to said objects. By the end I'm hoping to reduce the objects to small pixel clusters; the tricky part is that all this is happening in real-time. By mounting the cameras on a device where the point of view is know, it could be used to map out any static terrain by just navigating through it. Adding more cameras from different per
  • by SeaFox (739806) on Monday January 07, 2008 @08:19PM (#21948982)
    ...and no one is going to make a porn joke?
    • by hyades1 (1149581)

      Damn...I was thinking about that, but since you seem to be one step ahead of me, it seems kind of redundant.

      I'll leave the elements for you to polish up and use if you like: If you're thinking about creating a model of your favorite porn star, women will stand to benefit from this a bit more than the guys. Might go through a bit more construction material, though.

  • This is very interesting. Unfortunately, it is going to be closed-source and patented.

    Does anyone know any open-source projects to do object reconstruction from video or still photographs? I'm asking because my group is building a 3D printer.
    http://www.reprap.org/ [reprap.org]
    (Self-link pimpage, etc. etc.)
    and I think it would be cool and useful to be able to capture a 3D model from photos or video of a sculpted maquette, pet cat, broken part, human, or so on.

    (I just stumbled across this by googling "gpl object reconst
  • If you could combine the techniques that create the models automatically, with techniques like this where a skilled artist is involved, you could produce some high quality output indeed.
  • You can easily make 3D images viewable with lcd shutter glasses and an nvidia card if you find some shots where the camera is panning across the scene, and it's pretty static, using software like 3D Combine [photoalb.com]. Just take two frames so many frames apart and use one for each eye. I did this with some old Betty Boop cartoons (which were made by rotoscoping, that is, based on actual photographic images) and they worked great.

  • Now I can finally have a 3-D model of the starship swordbreaker, finally.

  • If Microsoft made this announcement it would be condemned as "vaporware". The main site claims it is in beta and they are looking for commercial partners, so it apparently is not open source and no use to us at this time.

    I appreciate the links and information in the discussion prompted by this article. Although I'm underwhelmed by the actual announcement, I've learned a lot from the links you folks have provided.
  • There's this hot little Night Elf Paladin chick I have my eyes on...
  • Even more impressive is the Campanile movie [debevec.org], where an entire 3D model of the UC Berkeley campus and a fly-by shot was generated from just 15 still pictures. This was done a whole decade ago, for SIGGRAPH 97.

  • The University of Kiel (Germany) presented quite exactly the same stuff (without the need of manually marking objects or object boundaries) at CeBiT 2003.

    check this video (scroll page to "Movie for presentation on CeBIT 2003").
    http://www.mip.informatik.uni-kiel.de/tiki-index.php?page=3D+reconstruction+from+images [uni-kiel.de]
  • Weird Science wasn't a movie...it was a prophecy!
  • I only have one response to that.

    The Internet is for porn.
  • I think the description is a little bit wrong cause it makes people think this software actually is very automatic, when in fact it just do what Blender and other softwares do, but with videos instead of images, what should not be difficult to add in Blender also. You could check the video here to see that is very manuall http://www.acvt.com.au/research/videotrace/ [acvt.com.au] The only advance to me is the automatic UVmapping.

"Free markets select for winning solutions." -- Eric S. Raymond

Working...