Forgot your password?
typodupeerror
Graphics

Rome, Built In a Day 107

Posted by timothy
from the what-about-compilation dept.
spmallick writes "Researchers at the University of Washington, in collaboration with Microsoft, have recreated the city of Rome in 3D using images obtained from Flickr. The data set consists of 150,000 images from Flickr.com associated with the tags 'Rome' or 'Roma,' and it took 21 hours on 496 compute cores to create a 3D digital model. Unlike Photosynth / Photo Tourism, the goal was to reconstruct an entire city and not just individual landmarks. Previous versions of the Photo Tourism software matched each photo to every other photo in the set. But as the number of photos increases the number of matches explodes, increasing with the square of the number of photos. A set of 250,000 images would take at least a year for 500 computers to process... A million photos would take more than a decade! The newly developed code works more than a hundred times faster than the previous version. It first establishes likely matches and then concentrates on those parts."
This discussion has been archived. No new comments can be posted.

Rome, Built In a Day

Comments Filter:
  • legality (Score:4, Insightful)

    by timpdx (1473923) on Wednesday September 16, 2009 @04:41PM (#29445885)
    IANAL, but is this legal? I somehow think that Microsoft doesn't have 150K photographer releases in their paws.
  • by Anonymous Coward on Wednesday September 16, 2009 @04:43PM (#29445925)

    uses subgraph mathing.

    Yours In Akademgorodok,
    K. Trout

  • Cool, but... (Score:2, Interesting)

    by Eggplant62 (120514) on Wednesday September 16, 2009 @04:44PM (#29445949)

    I wish this were done more with free software rather than with help from the Beast from Redmond. I'm certain the faculty at UW are completely familiar enough with free software that they could have made this work without MS's help.

    • Re:Cool, but... (Score:5, Insightful)

      by westlake (615356) on Wednesday September 16, 2009 @04:51PM (#29446065)

      I'm certain the faculty at UW are completely familiar enough with free software that they could have made this work without MS's help.

      150,000 photos. 21 Hours. 496 Cores. That makes it a labor intensive, computation intensive project. None of that comes "free as in beer."

      • Sure it does (Score:4, Insightful)

        by SuperKendall (25149) on Wednesday September 16, 2009 @06:32PM (#29447487)

        ...None of that comes "free as in beer."...

        150,000 photos.

        From Flickr. It's not like some poor bastard was paid to be out there photographing for weeks.

        21 Hours. 496 Cores.

        Don't recall folding@home or seti@home paying me anything.

        In short - who wouldn't pony up a few days of computing power to have a fully open 3D model of some of earths greatest landmarks? We only need someone to do the code to distribute, but the basic framework for distributed computation is already in place.

        • by westlake (615356) on Wednesday September 16, 2009 @08:12PM (#29448543)

          From Flickr. It's not like some poor bastard was paid to be out there photographing for weeks.

          No. But Flickr simplifies the problem if you are building a model of a world destination-city like Rome or Venice.

          What interests me more is the possibility of building models of cities and landmarks across time. Perhaps using sources other than photographs. Lincoln's Washington. New York City in 1939.

          who wouldn't pony up a few days of computing power to have a fully open 3D model of some of earths greatest landmarks?

          That's still a serious commitment - and we could be talking weeks or months.

          • by SuperKendall (25149) on Wednesday September 16, 2009 @10:01PM (#29449399)

            What interests me more is the possibility of building models of cities and landmarks across time. Perhaps using sources other than photographs. Lincoln's Washington. New York City in 1939.

            Me too but that's not what Microsoft or these researchers are doing, so it's not really related to my response.

            That's still a serious commitment - and we could be talking weeks or months

            So what? The results last forever. And any one person doesn't have to be serious, it's not like I run Folding all the time - but in aggregate, the science gets done.

        • by j1mmy (43634) on Wednesday September 16, 2009 @09:39PM (#29449239) Journal

          Most university research groups do not have funds to buy bits of computing time here and there. For a project like this, the research group more likely has a dedicated computing cluster bought with grant money or sponsor money.

      • by Anonymous Coward on Wednesday September 16, 2009 @06:35PM (#29447525)

        Actually, a large computing-grid/supercomputer facility is a staple of research universities as a service for their faculty and that they can rent out to other people. It might have taken 40 hours or 90 hours without Microsoft's help, but that's hardly impossible.

      • by Anonymous Coward on Wednesday September 16, 2009 @09:05PM (#29449005)

        Well, why not run 62 cores for 8 days instead?
        And if your research group can't come up with ~62 cores for 8 days in this day and age, you have a serious problem on your hands. Even our "private"/self-owned cluster is 54 cores as of now and growing (4 grad students, 1 Prof - and physics, not CS).
        Besides, there's always NERSC & Co if you need some serious computing power...

    • by Anonymous Coward on Wednesday September 16, 2009 @04:52PM (#29446073)

      Hell yeah! I come to Slashdot for the slightly outdated stories, stay to read the comments of disillusion Linux fanboys.

    • by natehoy (1608657) on Wednesday September 16, 2009 @04:58PM (#29446159) Journal

      Possibly, but I'm not sure Hugin or another free equivalent could manage something quite on this scale. Microsoft appears to have customized matching algorithms and provided some pretty staggering amounts of computing power. Short of using something similar to folding@home, I can't imagine the school doing this.

      Microsoft gets publicity and some new algorithms for image stitching, the school gets funding for a research project.

      • Re:Cool, but... (Score:4, Informative)

        by afidel (530433) on Wednesday September 16, 2009 @05:11PM (#29446359)
        496 cores isn't all that much, with HT enabled a 1U server can hold 16 cores so a 42U rack can hold 672 cores, blade servers are even more dense. The budget for most midsized IT departments probably has room for a few compute clusters of that size.
      • by 93 Escort Wagon (326346) on Wednesday September 16, 2009 @05:41PM (#29446763)

        Possibly, but I'm not sure Hugin or another free equivalent could manage something quite on this scale. Microsoft appears to have customized matching algorithms and provided some pretty staggering amounts of computing power. Short of using something similar to folding@home, I can't imagine the school doing this.

        Microsoft gets publicity and some new algorithms for image stitching, the school gets funding for a research project.

        Well, one of the authors works for Microsoft Research - so I doubt any FOSS projects were ever considered. Microsoft also donates bucketloads of money to UW's CSE department, and the faculty have lots of Microsoft ties. Bill Gates has been an invited speaker over here many times.

        The CSE department is very high caliber, don't get me wrong. But it is widely perceived by the rest of the campus as a Microsoft shop.

    • by Quarters (18322) on Wednesday September 16, 2009 @05:05PM (#29446293)
      As soon as people who write free software can band together and field something like Microsoft's R&D division I'm sure the U of W will consider it. It wasn't just software Microsoft contributed it was the enormous freaking brains that wrote the software. Smart people can make money with their smarts. Most choose to do so. Many go to work for MS because they pay their researchers extremely well. You can blather on all you want about how evil Microsoft is (which isn't possible as corporations are amoral by definition), but you have to acknowledge the costs they absorbed in helping this project. Evil doesn't usually go with altruism. Maybe IBM or RedHat could offer the same level of support. It's not Microsoft's fault that they can't, won't, or didn't.
      • Re:Cool, but... (Score:2, Informative)

        by Anonymous Coward on Wednesday September 16, 2009 @05:31PM (#29446645)
        Most of the brains on this project are AT the University of Washington. If you recall (or read the article), Photosynth is a UW project licensed to Microsoft. Not that there aren't amazingly smart people over at MSR, too. They can't help that they're right across the lake from each other (UW & MSR).
    • by jammindice (786569) on Wednesday September 16, 2009 @05:06PM (#29446301) Homepage
      Don't worry Google is working on this as well, they will soon recreate the world using pictures from your accidentally public facebook pictures. Not only will they be able to provide 3d city models but also provide you a model of your home, pool, garage, neighborhood, vacation spots, etcc... Soon you won't even need to vacation you can just step in the the local Google Holo Deck and be instantly transported to your favorite destination with real-time visual updates!!

      Aint the future grand?
    • Re:Cool, but... (Score:3, Informative)

      by Anonymous Coward on Wednesday September 16, 2009 @05:20PM (#29446491)

      There are 2 opensource projects aiming to do similar 3d reconstructions:

      http://code.google.com/p/libmv/ [google.com]
      http://insight3d.sourceforge.net/ [sourceforge.net]

      So while getting those 496 cores would still be a task for you, opensource software _is_ nearly there too.

    • by harlows_monkeys (106428) on Wednesday September 16, 2009 @05:50PM (#29446887) Homepage

      Key parts of this software are available as free software [washington.edu].

      .

    • by spoco2 (322835) on Wednesday September 16, 2009 @08:33PM (#29448729)

      Why the hell does it matter? Seriously, are you that anti MS that you can't handle them funding some cool research. Things are allowed to exist without being OS you know.

      There have been plenty, plenty, plenty of fantastic benefits to mankind done with the help of private industry. Using Open Source or not does not equate to good vs evil

    • by Anonymous Coward on Wednesday September 16, 2009 @08:55PM (#29448905)

      Sorry, but all the free software people were busy still trying to make this the year of the linux desktop. There was no one left to court the school and then spend dedicated hours with them... *ducks*

    • by im_thatoneguy (819432) on Wednesday September 16, 2009 @09:54PM (#29449357)

      Done 'more with free software'? It's original code.

      If you want to license the algorithms you can contact UW and they'll happily come up with an arrangement for you.

      I don't see what bearing Microsoft has or does not have with this project except licensing some of their older technology for Photosynth. Most of the tech used in this project which UW isn't trying to license is open source.

  • by jhsiao (525216) on Wednesday September 16, 2009 @05:00PM (#29446197)
    Photosynth was showcased in a mid 2007 TED talk. You can find it here [ted.com].

    It would be nice to have photosynths of monuments, art, or architecture that have been damaged or destroyed (e.g. the Buddhas dynamited in Afghanistan, the churches that collapsed in the 2009 Italy earthquake) from tourist photos that may be floating out in the interwebs.
  • Why O(n squared)? (Score:4, Interesting)

    by clone53421 (1310749) on Wednesday September 16, 2009 @05:01PM (#29446225) Journal

    Previous versions of the Photo Tourism software matched each photo to every other photo in the set.

    If you're building an entire digital model, wouldn't there be some point at which it would be more efficient to match each new photo to the digital model itself (instead of all the other individual photos)? At that point, the 3D model would be nearly complete, and matching new photos would be closer to O(n), as I see it. Additional photos would primarily only increase the detail/resolution of the existing model.

    • by Anonymous Coward on Wednesday September 16, 2009 @05:50PM (#29446885)

      If you're building an entire digital model, wouldn't there be some point at which it would be more efficient to match each new photo to the digital model itself (instead of all the other individual photos)? At that point, the 3D model would be nearly complete, and matching new photos would be closer to O(n), as I see it.

      O(n) is so vague in your statement. The obvious "what is 'n'?" answer is the number of previous photos added. But, if there are n terms, and each takes n time it's O(n^2). I'm no graphics whiz, but it seems impossible that finding where in the model a certain picture is would be constant time. However, I wonder if comparing against the model would allow for a significant speedup by removing "fluff" - images that contribute nearly the same information. There are probably thousands of pictures of the Coliseum in their sample, yet a subset of that would be sufficient. I don't suppose this improves its worst case scenario above O(n) for each term, though, as you could choose pictures to minimize overlaps. I wonder how they did get the speedups, though...

      • by clone53421 (1310749) on Thursday September 17, 2009 @08:29AM (#29452319) Journal

        O(n) is so vague in your statement. The obvious "what is 'n'?" answer is the number of previous photos added.

        Well, yes.

        if there are n terms, and each takes n time it's O(n^2).

        Yeah, that's what is being said in TFS.

        I'm no graphics whiz, but it seems impossible that finding where in the model a certain picture is would be constant time. However, I wonder if comparing against the model would allow for a significant speedup by removing "fluff" - images that contribute nearly the same information. There are probably thousands of pictures of the Coliseum in their sample, yet a subset of that would be sufficient.

        Basically, this. By comparing against the model, at a certain point you'd have mostly overlaps, and at that point, the model would no longer grow significantly, and comparing new pictures to it would be closer to a constant, which is why I suggested that it could approach O(n). Of course, this point would be hard to reach... although, if you intentionally took photos to map the entire 3D scene, you'd want to build the model fairly quickly and efficiently, at which point other people's photos could easily be geotagged by comparing them to the basically-complete model (rather than the individual, overlapping photos you used to create it).

    • by Anonymous Coward on Wednesday September 16, 2009 @05:51PM (#29446907)

      Obviously the talented folks at MS helped them write the software as well :)

    • by Anonymous Coward on Wednesday September 16, 2009 @09:58PM (#29449379)

      In Photo Tourism, each photo was matched to each other in order to build a graph of the whole set, where the nodes are photos and edges indicate that two photos observe a common point. This helps to weed out photos that only observe a few points (i.e. they have few connections to other photos. they may be low quality or have alot of distortion or may only contain a small portion of whatever it is you want a model of) Then the two best pictures are chosen, that is, the two pictures who have the most overlap in terms of the points and they see AND are taken far away enough from each other so that they provide a good enough baseline for other pictures to be added. Finally photos are added in one at a time based on which of the remaining photos observe the most points that are already in the scene.
      -T

  • by UnknowingFool (672806) on Wednesday September 16, 2009 @05:03PM (#29446255)
    The research team also announced their next project: Natalie Portman 3D based on the actress of the same name. The team is asking geeks everywhere for their assistance in providing pictures of her, and of course, grits.
  • UW website (Score:5, Informative)

    by guido1 (108876) on Wednesday September 16, 2009 @05:05PM (#29446281)

    The teams actual site has more pics and videos, including St. Peter's Basilica, Trevi Fountain, and info on Venice.

    http://grail.cs.washington.edu/rome/ [washington.edu]

  • by chord.wav (599850) on Wednesday September 16, 2009 @05:05PM (#29446291) Journal

    The newly developed code works more than a hundred times faster than the previous version. It first establishes likely matches and then concentrates on those parts.

    It would have been even faster if they'd have started with the edges and leaved the sky for the end like in any other puzzle.

  • by 1WingedAngel (575467) on Wednesday September 16, 2009 @05:06PM (#29446295) Homepage

    FTFS:
    It first establishes likely matches and then concentrates on those parts."

    Sounds like when you are putting together a jigsaw puzzle and you find the edge pieces first and work in from there.

  • Video games (Score:5, Interesting)

    by VinylRecords (1292374) on Wednesday September 16, 2009 @05:07PM (#29446315)

    Imagine if the God of War team could instantly recreate entire cities like this. Or the Fallout 3 team could snap a few thousand photos of Las Vegas and then digitize an entire city within a day and then work out the kinks. Or the Grand Theft Auto developers could recreate New Yo...ahem, Liberty City and then build a perfect 3D model and just slap textures on the buildings.

    Sure it's not a perfect system but this has so much potential to help recreate cities or terrain within video games.

    • by Anonymous Coward on Wednesday September 16, 2009 @06:36PM (#29447535)

      That's pretty smart, until you get blackmailed by a church for using it's likeness in a game [cnn.com]

      • Re:Video games (Score:1, Offtopic)

        by SanityInAnarchy (655584) <ninja@slaphack.com> on Wednesday September 16, 2009 @09:02PM (#29448975) Journal

        Wow.

        Govender said the church would also seek a donation to be used in its work with young people. He did not specify how much the company would be asked to pay.

        See, it's really about the money, not whatever "desecration" they claim. Blackmail is right.

        "We are concerned about the amount of violence in these games," McKie said Monday. "It's real for us. We are living the reality here. It's not just a game."

        Yes, because in reality, you're clearly fighting against Chimera.

    • Re:Video games (Score:3, Interesting)

      by MorpheousMarty (1094907) on Wednesday September 16, 2009 @06:49PM (#29447717)
      Although your idea is very cool, it would be much easier to use something like the Google Vans to do this. The hard part of this project is figuring out where the cameras were pointing when the pictures were taken. With good geo location information and an electronic compass you can eliminate that, difficult, part of the process. I'm sure if you payed enough you could just license the high quality originals used in street view and do the same thing for a fraction of the cost.
      • Re:Video games (Score:4, Interesting)

        by shutdown -p now (807394) on Wednesday September 16, 2009 @07:32PM (#29448161) Journal

        The way you get pics isn't really a big deal, the interesting part is software that takes them and makes a 3D model out of it.

        But yeah, combining Street View with Photosynth is an obvious thing that comes to mind.

        • by loconet (415875) on Wednesday September 16, 2009 @11:50PM (#29450173) Homepage

          The way you get pics isn't really a big deal, the interesting part is software that takes them and makes a 3D model out of it.

          The way you get the pics _is_ a big deal - it makes the second part (making the 3d model) much harder if you don't have the right information. That is one of the biggest obstacles this project attempts to address according to the paper. As the paper says, this has somewhat been done before in a controlled environment (eg. google earth) with mostly commissioned areal photographs where the camera is calibrated, shots are taken at predefined intervals, time is known, GPS aids location, etc ,etc. When you remove all that information by fetching the images from a "uncontrolled" source like flickr, it is a whole other game (although you also gain other valuable aspects like views from inside the building, etc).

        • by Timmmm (636430) on Monday September 21, 2009 @06:44PM (#29497653)

          "But yeah, combining Street View with Photosynth is an obvious thing that comes to mind."

          Been done. I think I saw it in a demo video for an 'augmented reality' phone navigation program, possibly for android. The trouble is you only have two or three views of each point. What *would* be cool is to record a video panorama as you travel round a city. That would be better because

          a) You have many more views, and
          b) You know one frame is taken near the previous and next frames, so you can use optical flow algorithms and probably get more points in the point cloud.

      • by 4D6963 (933028) on Wednesday September 16, 2009 @11:41PM (#29450121)

        Isn't where the new iPhones come in? They have GPS and a compass. But you're right that it would probably simplify things to make it more systematic, mostly when you already have all the StreetView data readily available, considered that it's full panoramas with sufficient increments of parallax for anything.

    • by khchung (462899) on Wednesday September 16, 2009 @09:18PM (#29449101) Journal

      I know this sounds ridiculous, but this is the current (insane) state of copyright laws we have. If game companies recreate real cities from tourists' pictures and put them in games, they are violating the copyrights of those tourists. I assume putting pictures on Flickr does not mean assigning copyright to them nor gave blanket permission to 3rd parties to do whatever they want.

      If game companies like the current "protection" of the copyright laws, they need to be bound by the same rules.

  • by BigBadBus (653823) on Wednesday September 16, 2009 @05:11PM (#29446363) Homepage
    ....but it would have been if the first coat had dried.
  • by Monkeedude1212 (1560403) on Wednesday September 16, 2009 @05:17PM (#29446439) Journal

    Aren't humans just awesome?

    We build amazing structures that last over a thousand years of constant wear and we invent photography to capture the awe inspiring moments that such marvelous creations cast upon ourselves, then create computers to recreate their 3D Dimensions almost perfectly in a virtual environment using nothing but our pictures that we've taken and our impressive ingenuity.

    If you can read this: Pat yourself on the back.

    • by tool462 (677306) on Wednesday September 16, 2009 @09:26PM (#29449161)

      I just fed all of the photos ever taken of me into Photosynth, made a 3-D model of myself, and then made the model pat itself on the back. I'm WAY too lazy to lift my own arm that far.

    • by lanceran (1575541) on Thursday September 17, 2009 @12:48AM (#29450523)

      Aren't humans just awesome?

      We build amazing structures that last over a thousand years of constant wear and we invent photography to capture the awe inspiring moments that such marvelous creations cast upon ourselves, then create computers to recreate their 3D Dimensions almost perfectly in a virtual environment using nothing but our pictures that we've taken and our impressive ingenuity.

      If you can read this: Pat yourself on the back.

      Now imagine in a hundred years or so we'd able to neurally interface with a computer and explore said 3D structures in a pseudo-reality. Now THAT would be amazing.

  • by Anonymous Coward on Wednesday September 16, 2009 @05:19PM (#29446479)

    So?

  • by grumbel (592662) <grumbel@gmx.de> on Wednesday September 16, 2009 @05:20PM (#29446497) Homepage

    It is nice to see that they have optimized the algorithm, but what about the presentation? It looks like it is still just a point cloud, just as it was two years ago. Why isn't it a fully textured 3d model? It shouldn't be that hard to do that when you already have the points in 3d.

  • by Anonymous Coward on Wednesday September 16, 2009 @05:25PM (#29446559)

    If the poster would have cared to read the article or watched the videos, he would have noticed they did not reconstruct the entire city. Not even close.

    As would be expected from a collection of photos obtained from flickr they merely succeeded in reconstructing some landmark sites. The closest they got was the old city of Dubrovnik (because that seems to be the most important actual 'landmark' there).

    Even the places which worked best are more like a point/partial mesh cloud thing.

    Still this is very cool technology and shows for what astounding things data collections on the web may be used for in the future.

    I wonder why they didn't try the google street view photos from let's say san francisco and combined them with aerial shots. Shouldn't that work a lot better?

    Or would that just not be a challenge?
       

    • by 644bd346996 (1012333) on Wednesday September 16, 2009 @05:50PM (#29446883)

      The impressive part of this isn't the 3d reconstruction (that's been done many times before, though perhaps not on this scale), it's that they've done it with such a disorganized, incomplete data set as flikr. Using Google Street View data (particularly with the locations already known) would be computationally much easier, but requires paying people to drive around with cameras on the roof.

  • by Anonymous Coward on Wednesday September 16, 2009 @06:00PM (#29447021)

    The next step would be to use video as the data source, or even panoramic video like the Google Street View cars [autoblog.com] capture. With such a system, simply driving by a building would provide thousands of frames from a range of viewpoints already. Putting all that together would be immensely computational intensive, but the result would be 3D-models of everything the Google cars have ever filmed.

  • by Anonymous Coward on Wednesday September 16, 2009 @06:05PM (#29447085)

    Can this be used for Pr0n?

  • by mqduck (232646) <mqduck AT mqduck DOT net> on Wednesday September 16, 2009 @07:17PM (#29447981)

    I hereby declare this It's-Okay-to-Like-Microsoft-For-a-Day Day. This is pretty cool.

  • by SanityInAnarchy (655584) <ninja@slaphack.com> on Wednesday September 16, 2009 @07:18PM (#29447983) Journal

    If it takes a year for 500 computers, does that mean it'd take a month for 6,000 computers, or a day for 182,500 computers, or an hour for 4,380,000 computers?

    Or, in other words, the original version would cost about $438,000 of EC2 [amazon.com] time.

    The new version takes 21 hours on 496 cores -- again, could you do it in an hour on 10,416 cores? And that becomes $1,416 of EC2 time.

    So, it's not 100 times faster, just 100 times cheaper.

  • by PPH (736903) on Wednesday September 16, 2009 @07:41PM (#29448243)

    That Rome simulation had some problems working with Nero.

  • by bigngamer92 (1418559) on Wednesday September 16, 2009 @08:39PM (#29448773) Homepage Journal
    "The data set consists of 150,000 images from Flickr.com"

    It was built with Slave labor. We'll just call it "volunteers" in this case.

  • by jnnnnn (1079877) on Wednesday September 16, 2009 @09:23PM (#29449139)

    I'm pretty sure Google Street View already does some of this. Browsing around, it seems to know where the sides of buildings are and let you zoom in on them.

    From what I can see they're not blowing their own trumpet as much as these guys, but it can't be far away that Google Earth will have quite comprehensive 3D models of cities (Tokyo is already amazingly complete, although I don't know if that's an automatic system or not).

  • by 4D6963 (933028) on Wednesday September 16, 2009 @11:52PM (#29450197)

    By the way, Google StreetView has been mentioned, but if you wanted to do an entire city, wouldn't it be simpler to use a bunch of high res shots taken from an helicopter circling around the city?

    Also, could it be used by the military? To transform the photographic data from recon planes of an area into something that could be used in some simulation program? Imagine playing Call of Duty in the village you'd have a mission into in a few hours.

  • by Fuzzums (250400) on Thursday September 17, 2009 @03:46AM (#29451239) Homepage

    I don't think so.
    after a while you have a set of "high hit" images (ones that are found the most). start with that set
    if you have a location with 5000 images and after 50 of those images you stll don't have a hit for that location: move on to the next location.
    it would safe a lot of time.

Bus error -- please leave by the rear door.

Working...