Virtualizing a Supercomputer 57
bridges writes "The V3VEE project has announced the release of version 1.2 of the Palacios virtual machine monitor following the successful testing of Palacios on 4096 nodes of the Sandia Red Storm supercomputer, the 17th-fastest in the world. The added overhead of virtualization is often a show-stopper, but the researchers observed less than 5% overhead for two real, communication-intensive applications running in a virtual machine on Red Storm. Palacios 1.2 supports virtualization of both desktop x86 hardware and Cray XT supercomputers using either AMD SVM or Intel VT hardware virtualization extensions, and is an active open source OS research platform supporting projects at multiple institutions. Palacios is being jointly developed by researchers at Northwestern University, the University of New Mexico, and Sandia National Labs." The ACM's writeup has more details of the work at Sandia.
Other way (Score:5, Funny)
This is virtualization... Imagine someone Imagining a beowulf cluster of those!
-Matt
Re: (Score:2)
main = print ("Imagine" ++ si ++ " a beowulf cluster" ++ obc ++ " of those.")
si = " someone imagining" ++ si
obc = " of beowulf clusters" ++ obc
Cool. (Score:5, Funny)
Now we'll never need to build another expensive supercomputer. We'll just "virtualize" them on cheap desktops.
Oh. Wait...
Re: (Score:1)
Hey, you're right! (Score:2)
Imagine a Beowulf cluster...
Re: (Score:2)
Re:Cool. (Score:4, Interesting)
Now we'll never need to build another expensive supercomputer. We'll just "virtualize" them on cheap desktops.
I think you've got it backwards.
Now we're virtualizing cheap desktops on supercomputers.
What they're doing only makes sense if 5% of 4096 nodes* is cheaper than coding your app to run natively on the supercomputer.
Like really big hard drives, when you get up to supercomputer levels of performance, 5% is a lot to give away.
*Anyone know exactly what a node entails?
Re:Cool. (Score:4, Informative)
*Anyone know exactly what a node entails?
A node is generally just a fancy name for a computer in a cluster. Nodes don't always need a OS locally (getting it via PXE), and may have some special hardware. But honestly in my experience, a node is a node if the systems architect wants to call it one.
Re: (Score:2)
Re: (Score:2)
*Anyone know exactly what a node entails?
At the very least: CPU + RAM. Also of course some glue logic (chip set), firmware (BIOS) and an interface to the rest of the cluster (networking).
so they are 'only' wasting 200 machines (Score:1, Insightful)
5% may not sound like mubh, cut with 4096 nodes that's over 200 nodes that they are wasting.
Re:so they are 'only' wasting 200 machines (Score:4, Interesting)
Well, not sure how good they are now, but back when I studied at Uni we examined a few super-computer clusters and the rule of thumb in most cases was 1 CPU core per node was stuck doing IO for that node anyway, this was all before the move to Hypertransport with AMD though, so it may be much different for them now.
The fact was, it was a number that was constant, it wouldn't get worse with more nodes, it was always x nodes lost per y nodes, as this is. Just add more nodes :)
A worse problem would be if it was x^2 nodes per y nodes, then you're just throwing away money adding more.
Re: (Score:2)
My skepticism comes from overhead being "only" 5% is likely to be "only" an extra eight hours for a week long job to run. With CPU bound stuff you want to be as close to the metal as you can get and still have the stuff run.
Re: (Score:2)
Yeah, but if its IO bound, it should probably be re-written :)
Why? (Score:2)
Re: (Score:1)
ACSI Red Storm normally runs a dedicated lightweight kernel called Catamount, not Linux. Similarly, the IBM BlueGene systems run the IBM compute node kernel, not Linux. Linux is used on some supercomputers, even some of the biggest ones (e.g. ORNL's Jaguar system) but the performance penalty of using Linux as opposed to a lightweigher kernel for some applications can be substantial(e.g. > 10%).
Re: (Score:1)
Palacios lives inside the lightweight kernel host. Applications that want to run natively on the lightweight kernel without virtualization can at *no* penalty. Applications that are willing to pay the performance penalty of Linux can run Linux as a guest at a nominal additional virtualization cost. That way, applications that demand peak hardware performance get it, applications that need more complex OS services get it, and the downtimes associated with a complete system reboot are avoided.
In addiiton, the
Re: (Score:1)
Doh, my mistake, Roadrunner beat Jaguar by a little less than 5% in the SC08 Top500 list, not 0.5%. Still, I do wonder. :)
Re:Why? (Score:4, Interesting)
TL:DR; Normal desktop software doesn't run faster on a super computer than on your 4 year old laptop.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Actually, no. ASCI Red [wikipedia.org] was retired from service in 2005.
Re: (Score:2)
Re: (Score:1, Interesting)
Perhaps those 5 nodes only cost 50k.
How much would it cost to rewrite your one of a kind software and retest and verify it? There are other costs here that they are not letting us in on.
Re: (Score:2)
Re: (Score:2)
I have to admit to, ahem, "loling" at your response. I know open source has the benefit of driving down costs, but adapting your software from commodity hardware to enterprise hardware, and, to go even further and run it on esoteric and specialized hardware is expensive. Whether it's proprietary or not. In fact, it might even be cheaper to get a vendor to rewrite their proprietary code because they've got teams of devs that already know the software in and out. Paying an outside team to write an existing ap
Re: (Score:2)
Re: (Score:2)
And that's a relatively isolated example. Most of the entries on the top 100 supercomputers today will not be there in five years or ten years. They will probably not even be on the top 500 list at all within ten fifteen.
No one wants to run their business apps on such volatile hardware. For scientists doing one-off simulations, one-off hardware is fine.
Re: (Score:2)
Pizza and 2-liter bottles of Nos, for example.
Re:Why? (Score:5, Insightful)
> What is the point of virtualizing a supercomputer?
They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.
Re: (Score:2)
Re: (Score:1)
They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.
Sounds like the supercomputer in Greg Egan's short story Luminous. It was basically built from light and was reconfigured specifically for each different application.
Re: (Score:1)
They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.
His parents let him set off nuclear weapons in their basement? Woaw!
Re: (Score:2)
So that if the supercomputer crashes, it won't bring down uTorrent running in the background and mess up their seeding of Animal Collective's Merriweather Post Pavilion.
Why do you think?
Re: (Score:2)
I work for a supercomputer institute and am our resident grid/cloud junky. One of the reasons you might want to do this is to allow researchers to create virtual supercomputers on the supercomputer via advanced reservations for simulation runs. There's a variety of reasons that this can be useful. Some times software doesn't play nicely with other software on the system or requires specific versions of libraries (or even specific OSes). You may also want to test in an environment where you have control
OSS ftw. (Score:2, Interesting)
It is really pleasant to see more and more OSS projects which are being deployed at national level and large infrastructures.
Hopefully some less greedy company who benefit from such projects will start paying the volunteer developers. But then again, I have found that a lot of times if you are doing something as a hobby/interest/challenge, rather than because you were employed to do it, the outcome will be more refined and efficient. Though I have yet to experience the latter part first hand.
Re: (Score:1)
Palacios can run on real x86 hardware or on QEMU. In fact, most of our development is done on QEMU, which is open source. The VMWare image was something we did on the original 1.0 release just to help people get started running it and haven't done since, but VMware has *never* been required for development.
not a good idea. (Score:1, Interesting)
Virtualizing a Supercomputer is never the correct solution. supercomputers have in their nature a system of managing lesser processes. that system could be extended rather than adding another virtual management system to run parallel to the existing management system burdened with maintaining it as another running process.
Re: (Score:2, Informative)
Virtualization offers a number of potential advantages. A paper we have had accepted to IPDPS 2010 that enumerates more of them, but a few advantages quickly:
1. The combination of a lightweight kernel and a virtualzation layer allows applications to choose which OS they run on and how much they pay in terms of performance for the OS services they needs. Because Palacios is hosted inside an existing lightweight kernel that presents minimal overhead to applications that run directly on it, applications that d
Let me get this straight.... (Score:1)
The way virtualization works is it is a virtual layer spread across many nodes to avoid any down time when you get
one node that fails, the rest pick up the slack, and without having to stop the running systems. This is using linux architecture to
cluster many computers on the bottom layer, so as to have the look of one mega computer, when it actually is 100 computers or more...etc...
Then we get into supercomputing, which again uses clusters and usually uses linux, to be able to make all the computers act as
Re: (Score:1)
We're not trying to hide anything, and so I will admit to being surprised by this (anonymous) accusation. To address the anonymous coward's concerns, however:
1. Actual users of supercomputers care most about application run time because applications are what scientists run, not micro-benchmarks. As a result, our paper and research more generally focuses on the runtime penalty to real applications (e.g. Sandia's CTH code) as opposed to focusing on optimizing micro-benchmarks that aren't what real users of th