Forgot your password?
typodupeerror
Operating Systems Supercomputing Technology

Virtualizing a Supercomputer 57

Posted by kdawson
from the slicing-up-the-pie dept.
bridges writes "The V3VEE project has announced the release of version 1.2 of the Palacios virtual machine monitor following the successful testing of Palacios on 4096 nodes of the Sandia Red Storm supercomputer, the 17th-fastest in the world. The added overhead of virtualization is often a show-stopper, but the researchers observed less than 5% overhead for two real, communication-intensive applications running in a virtual machine on Red Storm. Palacios 1.2 supports virtualization of both desktop x86 hardware and Cray XT supercomputers using either AMD SVM or Intel VT hardware virtualization extensions, and is an active open source OS research platform supporting projects at multiple institutions. Palacios is being jointly developed by researchers at Northwestern University, the University of New Mexico, and Sandia National Labs." The ACM's writeup has more details of the work at Sandia.
This discussion has been archived. No new comments can be posted.

Virtualizing a Supercomputer

Comments Filter:
  • Cool. (Score:5, Funny)

    by John Hasler (414242) on Monday February 08, 2010 @09:18PM (#31067752) Homepage

    Now we'll never need to build another expensive supercomputer. We'll just "virtualize" them on cheap desktops.

    Oh. Wait...

    • Why virtualize a supercomputer when you can virtualize two for the same price of $19.95?
    • Re:Cool. (Score:4, Interesting)

      by TubeSteak (669689) on Monday February 08, 2010 @10:01PM (#31068014) Journal

      Now we'll never need to build another expensive supercomputer. We'll just "virtualize" them on cheap desktops.

      I think you've got it backwards.
      Now we're virtualizing cheap desktops on supercomputers.

      What they're doing only makes sense if 5% of 4096 nodes* is cheaper than coding your app to run natively on the supercomputer.
      Like really big hard drives, when you get up to supercomputer levels of performance, 5% is a lot to give away.

      *Anyone know exactly what a node entails?

      • Re:Cool. (Score:4, Informative)

        by Tynin (634655) on Monday February 08, 2010 @10:47PM (#31068210)

        *Anyone know exactly what a node entails?

        A node is generally just a fancy name for a computer in a cluster. Nodes don't always need a OS locally (getting it via PXE), and may have some special hardware. But honestly in my experience, a node is a node if the systems architect wants to call it one.

      • by creimer (824291)
        A supercomputer running 4096 copies of Windows will probably take a significant performance hit of more than 5%.
      • by LoRdTAW (99712)

        *Anyone know exactly what a node entails?

        At the very least: CPU + RAM. Also of course some glue logic (chip set), firmware (BIOS) and an interface to the rest of the cluster (networking).

  • by Anonymous Coward

    5% may not sound like mubh, cut with 4096 nodes that's over 200 nodes that they are wasting.

    • Well, not sure how good they are now, but back when I studied at Uni we examined a few super-computer clusters and the rule of thumb in most cases was 1 CPU core per node was stuck doing IO for that node anyway, this was all before the move to Hypertransport with AMD though, so it may be much different for them now.

      The fact was, it was a number that was constant, it wouldn't get worse with more nodes, it was always x nodes lost per y nodes, as this is. Just add more nodes :)

      A worse problem would be if it was x^2 nodes per y nodes, then you're just throwing away money adding more.

      • by dbIII (701233)
        It depends if the job is cpu bound or I/O bound.
        My skepticism comes from overhead being "only" 5% is likely to be "only" an extra eight hours for a week long job to run. With CPU bound stuff you want to be as close to the metal as you can get and still have the stuff run.
  • What is the point of virtualizing a supercomputer? A 5% performance loss is a pretty big loss, in say a cluster of 100 computers, 5 of them would be wasted translating to thousands of dollars lost with little to show for it.
    • Re: (Score:1, Interesting)

      by Anonymous Coward

      Perhaps those 5 nodes only cost 50k.

      How much would it cost to rewrite your one of a kind software and retest and verify it? There are other costs here that they are not letting us in on.

      • Not much if you run the program with an existing OS such as Linux. As for testing and verifying, I'd imagine for larger supercomputers it would be less and less of an issue while the 5% becomes more and more of an issue.
        • by Anpheus (908711)

          I have to admit to, ahem, "loling" at your response. I know open source has the benefit of driving down costs, but adapting your software from commodity hardware to enterprise hardware, and, to go even further and run it on esoteric and specialized hardware is expensive. Whether it's proprietary or not. In fact, it might even be cheaper to get a vendor to rewrite their proprietary code because they've got teams of devs that already know the software in and out. Paying an outside team to write an existing ap

          • by afidel (530433)
            ASCI Red was upgraded twice for a performance increase of 685%-564% depending on if you want to talk Peak or usable.
            • by Anpheus (908711)

              And that's a relatively isolated example. Most of the entries on the top 100 supercomputers today will not be there in five years or ten years. They will probably not even be on the top 500 list at all within ten fifteen.

              No one wants to run their business apps on such volatile hardware. For scientists doing one-off simulations, one-off hardware is fine.

      • by PopeRatzo (965947) *

        There are other costs here that they are not letting us in on.

        Pizza and 2-liter bottles of Nos, for example.

    • Re:Why? (Score:5, Insightful)

      by John Hasler (414242) on Monday February 08, 2010 @09:55PM (#31067974) Homepage

      > What is the point of virtualizing a supercomputer?

      They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.

      • by mhajicek (1582795)
        Plus they could simulate a system of multiple computers communicating and analyze the behavior of the system as a whole.
      • by JBird (31996) *

        They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.

        Sounds like the supercomputer in Greg Egan's short story Luminous. It was basically built from light and was reconfigured specifically for each different application.

      • They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.

        His parents let him set off nuclear weapons in their basement? Woaw!

    • by PopeRatzo (965947) *

      What is the point of virtualizing a supercomputer?

      So that if the supercomputer crashes, it won't bring down uTorrent running in the background and mess up their seeding of Animal Collective's Merriweather Post Pavilion.

      Why do you think?

    • by Nite_Hawk (1304)

      I work for a supercomputer institute and am our resident grid/cloud junky. One of the reasons you might want to do this is to allow researchers to create virtual supercomputers on the supercomputer via advanced reservations for simulation runs. There's a variety of reasons that this can be useful. Some times software doesn't play nicely with other software on the system or requires specific versions of libraries (or even specific OSes). You may also want to test in an environment where you have control

  • OSS ftw. (Score:2, Interesting)

    It is really pleasant to see more and more OSS projects which are being deployed at national level and large infrastructures.

    Hopefully some less greedy company who benefit from such projects will start paying the volunteer developers. But then again, I have found that a lot of times if you are doing something as a hobby/interest/challenge, rather than because you were employed to do it, the outcome will be more refined and efficient. Though I have yet to experience the latter part first hand.

  • not a good idea. (Score:1, Interesting)

    by Anonymous Coward

    Virtualizing a Supercomputer is never the correct solution. supercomputers have in their nature a system of managing lesser processes. that system could be extended rather than adding another virtual management system to run parallel to the existing management system burdened with maintaining it as another running process.

  • The way virtualization works is it is a virtual layer spread across many nodes to avoid any down time when you get
    one node that fails, the rest pick up the slack, and without having to stop the running systems. This is using linux architecture to
    cluster many computers on the bottom layer, so as to have the look of one mega computer, when it actually is 100 computers or more...etc...

    Then we get into supercomputing, which again uses clusters and usually uses linux, to be able to make all the computers act as

Cobol programmers are down in the dumps.

Working...