Virtualizing a Supercomputer 57

Posted by kdawson on Monday February 08, 2010 @07:49PM from the slicing-up-the-pie dept.

bridges writes "The V3VEE project has announced the release of version 1.2 of the Palacios virtual machine monitor following the successful testing of Palacios on 4096 nodes of the Sandia Red Storm supercomputer, the 17th-fastest in the world. The added overhead of virtualization is often a show-stopper, but the researchers observed less than 5% overhead for two real, communication-intensive applications running in a virtual machine on Red Storm. Palacios 1.2 supports virtualization of both desktop x86 hardware and Cray XT supercomputers using either AMD SVM or Intel VT hardware virtualization extensions, and is an active open source OS research platform supporting projects at multiple institutions. Palacios is being jointly developed by researchers at Northwestern University, the University of New Mexico, and Sandia National Labs." The ACM's writeup has more details of the work at Sandia.

Virtualizing a Supercomputer

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 57 Comments Log In/Create an Account

Comments Filter:

- Other way (Score:5, Funny)
  
  by Wrexs0ul ( 515885 ) writes: <mmeier@NosPAm.racknine.com> on Monday February 08, 2010 @08:22PM (#31067774) Homepage
  
  This is virtualization... Imagine someone Imagining a beowulf cluster of those!
  -Matt
  
  - Re: (Score:2)
    
    by Hurricane78 ( 562437 ) writes:
    
    main = print ("Imagine" ++ si ++ " a beowulf cluster" ++ obc ++ " of those.")
    si = " someone imagining" ++ si
    obc = " of beowulf clusters" ++ obc
Cool. (Score:5, Funny)

by John Hasler ( 414242 ) writes: on Monday February 08, 2010 @08:18PM (#31067752) Homepage

Now we'll never need to build another expensive supercomputer. We'll just "virtualize" them on cheap desktops.
Oh. Wait...

- Re: (Score:1)
  
  by Mitchell314 ( 1576581 ) writes:
  
  Why virtualize a supercomputer when you can virtualize two for the same price of $19.95?
  - Hey, you're right! (Score:2)
    
    by John Hasler ( 414242 ) writes:
    
    Imagine a Beowulf cluster...
    - Re: (Score:2)
      
      by __aaclcg7560 ( 824291 ) writes:
      
      ... at $19.95. I'll take a couple of those. :P
- Re:Cool. (Score:4, Interesting)
  
  by TubeSteak ( 669689 ) writes: on Monday February 08, 2010 @09:01PM (#31068014) Journal
  
  Now we'll never need to build another expensive supercomputer. We'll just "virtualize" them on cheap desktops.
  I think you've got it backwards.
  Now we're virtualizing cheap desktops on supercomputers.
  What they're doing only makes sense if 5% of 4096 nodes* is cheaper than coding your app to run natively on the supercomputer.
  Like really big hard drives, when you get up to supercomputer levels of performance, 5% is a lot to give away.
  *Anyone know exactly what a node entails?
  
  - Re:Cool. (Score:4, Informative)
    
    by Tynin ( 634655 ) writes: on Monday February 08, 2010 @09:47PM (#31068210)
    
    *Anyone know exactly what a node entails?
    A node is generally just a fancy name for a computer in a cluster. Nodes don't always need a OS locally (getting it via PXE), and may have some special hardware. But honestly in my experience, a node is a node if the systems architect wants to call it one.
    
  - Re: (Score:2)
    
    by __aaclcg7560 ( 824291 ) writes:
    
    A supercomputer running 4096 copies of Windows will probably take a significant performance hit of more than 5%.
  - Re: (Score:2)
    
    by LoRdTAW ( 99712 ) writes:
    
    *Anyone know exactly what a node entails?
    At the very least: CPU + RAM. Also of course some glue logic (chip set), firmware (BIOS) and an interface to the rest of the cluster (networking).
so they are 'only' wasting 200 machines (Score:1, Insightful)

by Anonymous Coward writes:

5% may not sound like mubh, cut with 4096 nodes that's over 200 nodes that they are wasting.
- Re:so they are 'only' wasting 200 machines (Score:4, Interesting)
  
  by Barny ( 103770 ) writes: on Monday February 08, 2010 @08:32PM (#31067832) Journal
  
  Well, not sure how good they are now, but back when I studied at Uni we examined a few super-computer clusters and the rule of thumb in most cases was 1 CPU core per node was stuck doing IO for that node anyway, this was all before the move to Hypertransport with AMD though, so it may be much different for them now.
  The fact was, it was a number that was constant, it wouldn't get worse with more nodes, it was always x nodes lost per y nodes, as this is. Just add more nodes :)
  A worse problem would be if it was x^2 nodes per y nodes, then you're just throwing away money adding more.
  
  - Re: (Score:2)
    
    by dbIII ( 701233 ) writes:
    
    It depends if the job is cpu bound or I/O bound.
    My skepticism comes from overhead being "only" 5% is likely to be "only" an extra eight hours for a week long job to run. With CPU bound stuff you want to be as close to the metal as you can get and still have the stuff run.
    - Re: (Score:2)
      
      by Barny ( 103770 ) writes:
      
      Yeah, but if its IO bound, it should probably be re-written :)
Why? (Score:2)

by Darkness404 ( 1287218 ) writes:

What is the point of virtualizing a supercomputer? A 5% performance loss is a pretty big loss, in say a cluster of 100 computers, 5 of them would be wasted translating to thousands of dollars lost with little to show for it.
- - - Re: (Score:1)
      
      by bridges ( 101722 ) writes:
      
      ACSI Red Storm normally runs a dedicated lightweight kernel called Catamount, not Linux. Similarly, the IBM BlueGene systems run the IBM compute node kernel, not Linux. Linux is used on some supercomputers, even some of the biggest ones (e.g. ORNL's Jaguar system) but the performance penalty of using Linux as opposed to a lightweigher kernel for some applications can be substantial(e.g. > 10%).
      - Re: (Score:1)
        
        by bridges ( 101722 ) writes:
        
        Palacios lives inside the lightweight kernel host. Applications that want to run natively on the lightweight kernel without virtualization can at *no* penalty. Applications that are willing to pay the performance penalty of Linux can run Linux as a guest at a nominal additional virtualization cost. That way, applications that demand peak hardware performance get it, applications that need more complex OS services get it, and the downtimes associated with a complete system reboot are avoided.
        In addiiton, the
        
        Re: (Score:1)
        
        by bridges ( 101722 ) writes:
        
        Doh, my mistake, Roadrunner beat Jaguar by a little less than 5% in the SC08 Top500 list, not 0.5%. Still, I do wonder. :)
  - Re:Why? (Score:4, Interesting)
    
    by Spazed ( 1013981 ) writes: on Monday February 08, 2010 @08:34PM (#31067854)
    
    Most of them would be running an application done in C/C++ or some other low level language with threading. The whole advantage of super computers isn't that they have an absurd ghz rating, but an insane amount of cores. This could be useful for testing how a network of desktop computers would work, which it sounds like from the summary they are doing.
    
    TL:DR; Normal desktop software doesn't run faster on a super computer than on your 4 year old laptop.
    
    - Re: (Score:2)
      
      by the linux geek ( 799780 ) writes:
      
      It would be far more likely to be FORTRAN than a C derivative. Also, plenty of supercomputers, especially IBM pSeries based ones, do have very high clock speeds (4-5GHz) and a relatively small number of cores; recent Nehalem systems follow the same trend.
      - Re: (Score:2)
        
        by afidel ( 530433 ) writes:
        
        Uh, this was run on ASCI Red, a 38,400 core Opteron based system with each node having a dedicated communication processor attached to a 3D torus for flat 1:1 communications.
        
        Re: (Score:2)
        
        by joib ( 70841 ) writes:
        
        Actually, no. ASCI Red [wikipedia.org] was retired from service in 2005.
        
        Re: (Score:2)
        
        by afidel ( 530433 ) writes:
        
        Sorry, Red Storm, my duh.
- Re: (Score:1, Interesting)
  
  by Anonymous Coward writes:
  
  Perhaps those 5 nodes only cost 50k.
  How much would it cost to rewrite your one of a kind software and retest and verify it? There are other costs here that they are not letting us in on.
  - Re: (Score:2)
    
    by Darkness404 ( 1287218 ) writes:
    
    Not much if you run the program with an existing OS such as Linux. As for testing and verifying, I'd imagine for larger supercomputers it would be less and less of an issue while the 5% becomes more and more of an issue.
    - Re: (Score:2)
      
      by Anpheus ( 908711 ) writes:
      
      I have to admit to, ahem, "loling" at your response. I know open source has the benefit of driving down costs, but adapting your software from commodity hardware to enterprise hardware, and, to go even further and run it on esoteric and specialized hardware is expensive. Whether it's proprietary or not. In fact, it might even be cheaper to get a vendor to rewrite their proprietary code because they've got teams of devs that already know the software in and out. Paying an outside team to write an existing ap
      - Re: (Score:2)
        
        by afidel ( 530433 ) writes:
        
        ASCI Red was upgraded twice for a performance increase of 685%-564% depending on if you want to talk Peak or usable.
        
        Re: (Score:2)
        
        by Anpheus ( 908711 ) writes:
        
        And that's a relatively isolated example. Most of the entries on the top 100 supercomputers today will not be there in five years or ten years. They will probably not even be on the top 500 list at all within ten fifteen.
        No one wants to run their business apps on such volatile hardware. For scientists doing one-off simulations, one-off hardware is fine.
  - Re: (Score:2)
    
    by PopeRatzo ( 965947 ) * writes:
    
    There are other costs here that they are not letting us in on.
    Pizza and 2-liter bottles of Nos, for example.
- Re:Why? (Score:5, Insightful)
  
  by John Hasler ( 414242 ) writes: on Monday February 08, 2010 @08:55PM (#31067974) Homepage
  
  > What is the point of virtualizing a supercomputer?
  They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.
  
  - Re: (Score:2)
    
    by mhajicek ( 1582795 ) writes:
    
    Plus they could simulate a system of multiple computers communicating and analyze the behavior of the system as a whole.
  - Re: (Score:1)
    
    by JBird ( 31996 ) * writes:
    
    They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.
    Sounds like the supercomputer in Greg Egan's short story Luminous. It was basically built from light and was reconfigured specifically for each different application.
  - Re: (Score:1)
    
    by LeadSongDog ( 1120683 ) writes:
    
    They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.
    His parents let him set off nuclear weapons in their basement? Woaw!
- Re: (Score:2)
  
  by PopeRatzo ( 965947 ) * writes:
  
  What is the point of virtualizing a supercomputer?
  So that if the supercomputer crashes, it won't bring down uTorrent running in the background and mess up their seeding of Animal Collective's Merriweather Post Pavilion.
  Why do you think?
- Re: (Score:2)
  
  by Nite_Hawk ( 1304 ) writes:
  
  I work for a supercomputer institute and am our resident grid/cloud junky. One of the reasons you might want to do this is to allow researchers to create virtual supercomputers on the supercomputer via advanced reservations for simulation runs. There's a variety of reasons that this can be useful. Some times software doesn't play nicely with other software on the system or requires specific versions of libraries (or even specific OSes). You may also want to test in an environment where you have control
OSS ftw. (Score:2, Interesting)

by Asadullah Ahmad ( 1608869 ) writes:

It is really pleasant to see more and more OSS projects which are being deployed at national level and large infrastructures.
Hopefully some less greedy company who benefit from such projects will start paying the volunteer developers. But then again, I have found that a lot of times if you are doing something as a hobby/interest/challenge, rather than because you were employed to do it, the outcome will be more refined and efficient. Though I have yet to experience the latter part first hand.
- - Re: (Score:1)
    
    by bridges ( 101722 ) writes:
    
    Palacios can run on real x86 hardware or on QEMU. In fact, most of our development is done on QEMU, which is open source. The VMWare image was something we did on the original 1.0 release just to help people get started running it and haven't done since, but VMware has *never* been required for development.
not a good idea. (Score:1, Interesting)

by Anonymous Coward writes:

Virtualizing a Supercomputer is never the correct solution. supercomputers have in their nature a system of managing lesser processes. that system could be extended rather than adding another virtual management system to run parallel to the existing management system burdened with maintaining it as another running process.
- - - Re: (Score:2, Informative)
      
      by bridges ( 101722 ) writes:
      
      Virtualization offers a number of potential advantages. A paper we have had accepted to IPDPS 2010 that enumerates more of them, but a few advantages quickly:
      1. The combination of a lightweight kernel and a virtualzation layer allows applications to choose which OS they run on and how much they pay in terms of performance for the OS services they needs. Because Palacios is hosted inside an existing lightweight kernel that presents minimal overhead to applications that run directly on it, applications that d
Let me get this straight.... (Score:1)

by hesaigo999ca ( 786966 ) writes:

The way virtualization works is it is a virtual layer spread across many nodes to avoid any down time when you get
one node that fails, the rest pick up the slack, and without having to stop the running systems. This is using linux architecture to
cluster many computers on the bottom layer, so as to have the look of one mega computer, when it actually is 100 computers or more...etc...
Then we get into supercomputing, which again uses clusters and usually uses linux, to be able to make all the computers act as
- Re: (Score:1)
  
  by bridges ( 101722 ) writes:
  
  We're not trying to hide anything, and so I will admit to being surprised by this (anonymous) accusation. To address the anonymous coward's concerns, however:
  1. Actual users of supercomputers care most about application run time because applications are what scientists run, not micro-benchmarks. As a result, our paper and research more generally focuses on the runtime penalty to real applications (e.g. Sandia's CTH code) as opposed to focusing on optimizing micro-benchmarks that aren't what real users of th

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Other way (Score:5, Funny)

Re: (Score:2)

Cool. (Score:5, Funny)

Re: (Score:1)

Hey, you're right! (Score:2)

Re: (Score:2)

Re:Cool. (Score:4, Interesting)

Re:Cool. (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

so they are 'only' wasting 200 machines (Score:1, Insightful)

Re:so they are 'only' wasting 200 machines (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Why? (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:1)

Re:Why? (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Why? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

OSS ftw. (Score:2, Interesting)

Re: (Score:1)

not a good idea. (Score:1, Interesting)

Re: (Score:2, Informative)

Let me get this straight.... (Score:1)

Re: (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals