First Look At VMware's vSphere "Cloud OS" 86

Posted by ScuttleMonkey on Friday May 22, 2009 @05:13PM from the trusting-someone-else-with-your-data dept.

snydeq writes "InfoWorld's Paul Venezia takes VMware's purported 'cloud OS,' vSphere 4, for a test drive. The bottom line: 'VMware vSphere 4.0 touches on almost every aspect of managing a virtual infrastructure, from ESX host provisioning to virtual network management to backup and recovery of virtual machines. Time will tell whether these features are as solid as they need to be in this release, but their presence is a substantial step forward for virtual environments.' Among the features Venezia finds particularly worthwhile is vSphere's Fault Tolerance: 'In a nutshell, this allows you to run the same VM in tandem across two hardware nodes, but with only one instance actually visible to the network. You can think of it as OS-agnostic clustering. Should a hardware failure take out the primary instance, the secondary instance will assume normal operations instantly, without requiring a VMotion.'"

First Look At VMware's vSphere "Cloud OS"

This discussion has been archived. No new comments can be posted.

Search 86 Comments Log In/Create an Account

Comments Filter:

Re:FT (Score:5, Informative)

by MartijnL ( 785261 ) writes: on Friday May 22, 2009 @05:29PM (#28059527)

FT only supports a single vCPU from my understanding... Not too many people running single CPU VM's, at least in my experience...
You should be running single vCPU machines by default and only scale CPU's if absolutely necessary and if the app is fully SMP aware and functional.

Xen did it first (Score:3, Informative)

by lightyear4 ( 852813 ) writes: on Friday May 22, 2009 @05:40PM (#28059641)

Check out the Kemari and Remus projects, which allow precisely the same in Xen environments. In essence, it's a continual live migration (vmware people, think continual vmotion) that resumes virtual machine execution on the backup node if the origin node dies. Very cool tech. The demonstration involved pulling the plug on one of the nodes. For more information just search, there are code and papers and presentation slides galore.

Re:FT (Score:5, Informative)

by asdf7890 ( 1518587 ) writes: on Friday May 22, 2009 @05:56PM (#28059817)

If your apps are not SMP aware, WTF at you using? Multiple CPUs has been the standard for servers for at least a decade in x86 gear.
Yes, but it doesn't work in VMs the same way, at least not in VMWare. On a loaded system you often find a single-vCPU VM will out perform one with more than one vCPU, in fact if you can spread your app over multiple machines you are generally better off running two single CPU VMs instead of one dual-CPU one. This is true no matter how many physical CPUs/cores you have available.
Why is this? Because a single CPU VM can be scheduled when-ever there are time-slices available on any physical CPU/core (though a good hypervisor will try not bounce VMs between cores too often, as this reduces the potential gains from using the core's L1 cache and (on architectures where L2 cache isn't shared) L2 cache too), but a dual vCPU VM will have to wait until the hypervisor can give it timeslices on two CPUs at the same time. If this is the only VM that is actively doing anything on the physical machine (and the host OS is otherwise quiet too) this makes little difference aside from a small overhead on top of the normal hits for not running on bare metal, but as soon as that VM is competing with other processes for CPU resource it can have a massive negative effect on scheduling latency.

How does it detect a 'failure'? (Score:5, Informative)

by moosesocks ( 264553 ) writes: on Friday May 22, 2009 @06:09PM (#28059939) Homepage

How many hardware failures are actually characterized by a complete 100% loss of communication (as you'd get by pulling the plug)?
Don't CPU and Memory failures tend to make the computer somewhat unstable before completely bringing it down? How would vSphere handle (or even notice) that?
Even hard disk failures can take a small amount of time before the OS notices anything is awry (although you're an idiot if you care enough about redundancy to worry about this sort of thing, but don't have your disks in a RAID array)

Re:Instantly? (Score:5, Informative)

by lightyear4 ( 852813 ) writes: on Friday May 22, 2009 @06:12PM (#28059959)

Instantly? Of course not. But the time required is equivalent to vmotion/live migration in bog-standard virtualization. How long? "That depends." To throw numbers at you, 30-100ms -- variance largely dependent upon how quickly your network infrastructure can react to MACs changing locations, whether in-flight TCP streams are broken as a result, etc. To help switches cope, people usually send a gratuitous ARP to jumpstart the process.

Don't ask, just look. (Score:5, Informative)

by RulerOf ( 975607 ) writes: on Friday May 22, 2009 @06:13PM (#28059971)

One of the statistics measured by virtualcenter is the lag you're asking about.

The first hit on google images [ntpro.nl] should give you a good idea.

In practice, I don't know... I imagine that the secondary instance will still receive network traffic bound for the cluster, so it'd probably be perceived as a hiccup when the primary one goes down, which is good enough for most services.

It works as advertized (Score:5, Informative)

by RobiOne ( 226066 ) writes: on Friday May 22, 2009 @06:16PM (#28060003) Homepage Journal

Like everyone else pointed out, it's a VM in lockstep with a 'shadow' VM. This is not just 'continuous VMotion'.
If something happens to the VM, the shadow VM goes live instantly (you don't notice a thing if you're doing something on the VM).
Right after that, the system starts bringing up another shadow VM on another host to regain full FT protection.
This can be network intensive, depending on the VM load, and currently only works with 1 vCPU per VM. Think 1-2 FT VMs per ESX host + shadow VMs.
You'll need recent CPUs that support FT and have an VMware HA / DRS Cluster set up.
So if you've got it, use it wisely. It's very cool.

Re:Xen did it first (Score:1, Informative)

by Anonymous Coward writes: on Friday May 22, 2009 @06:17PM (#28060007)

VMware FT is not based on continuous memory snapshotting, it uses deterministic record/replay to do simultaneous record and replay. You could find a overview of this technology at http://www.vmware.com/products/fault-tolerance [vmware.com]
Also VMware demonstrated a working prototype as early as in 2007
http://www.vmworld.com/community/conferences/2007/agenda/ [vmworld.com]
w.r.t to Xen, doing a proof of concept is one thing but implementing it and supporting it in production quality with sufficient performance is another thing.

Re:Xen did it first (Score:3, Informative)

by ACMENEWSLLC ( 940904 ) writes: on Friday May 22, 2009 @06:19PM (#28060027) Homepage

We have both vMotion and XEN.
vMotion is very noticeable. Some things fail when it happens. Zenworks 6.5 is an example.
With Xen, we setup a VNC mirror. EG the guest was VNC Viewing itself. We were moving a window around and then we moved the guest from Xen server 1 to 2 (we have iSCSI BTW.) There was a noticeable affect that lasted for less than a second, but then we were on XEN #2.
It's nice to see VMWare getting this feature right with vSphere.

Re:FT (Score:2, Informative)

by asdf7890 ( 1518587 ) writes: on Friday May 22, 2009 @06:24PM (#28060069)

, but a dual vCPU VM will have to wait until the hypervisor can give it timeslices on two CPUs at the same time.
Then their hypervisor is broken. It should be possible for A dual vCPU machine to have vCPU1 and vCPU2 be two timeslices on the same real cpu if need be.
Which would kill any benefit of running SMP in the VM anyway, if it were possible.
My understanding, which may be out of date, is that this is not considered a good idea as timing issues between threads on the two vCPUs if scheduled one after the other on the same core could potentially cause race conditions. And if not that serious, the threads on the vCPU that gets the first slice of the real core could be paused waiting for locks to be released by threads the guest OS has lined up to runon the other vCPU. This is an explanation that I have seen given as to why VMWare would not allow you to do virtual SMP on a single-core-single-CPU host machine (i.e. emulating SMP in the guest by giving two or more vCPUs alternating time on the only physical core).

Re:Xen did it first (Score:2, Informative)

by qnetter ( 312322 ) writes: on Friday May 22, 2009 @07:58PM (#28060975)

Marathon has had it working on Xen for quite a while.

Re:Instantly? (Score:3, Informative)

by Mista2 ( 1093071 ) writes: on Saturday May 23, 2009 @05:55AM (#28065087)

It keeps a running copy on the failover host, reading from the same storage as the active host. It's as if the server were about to complete VMotion without having just done the final step. outage time is a small hiccough, less than a second. Current running sessions just carry on. If its uploading a file to someone, it just carries on. The outage is well withing the tollerance of typical TCP sessions.

Re:Instantly? (Score:3, Informative)

by Thumper_SVX ( 239525 ) writes: on Saturday May 23, 2009 @04:54PM (#28069505) Homepage

It's close enough. I played with this feature at VMworld last year, and when running SQL transactions along with a ping, we dropped one packet but the SQL transactions didn't miss a beat.
It's impressive enough... the two systems are working in lockstep, such that even memory is duplicated between the two systems. It's an extension on the existing VMotion function in VMware today. However, bear in mind it has some limitations; only one CPU is possible at the moment and you still have the overhead of really two VM's running at once instead of just one. So it's not a solution for ALL of your environment, just part of it.
I'm sure the limitations will be eased over time as they tune the technology... but as a first attempt it IS awesome. Thing is, in my environment the stuff that is needed so critically that it can't take an hardware failure is usually >1 CPU, so this isn't a solution... but I guess if you have some relatively low-load but high-criticality servers then you could find a use for it (web servers seem like a good place to do this).
And the answer is no, I don't think the end user will ever notice so long as your network infrastructure is good enough. Certainly, my users never notice a VMotion event.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

First Look At VMware's vSphere "Cloud OS" 86

First Look At VMware's vSphere "Cloud OS" More Login

First Look At VMware's vSphere "Cloud OS"

Re:FT (Score:5, Informative)

Xen did it first (Score:3, Informative)

Re:FT (Score:5, Informative)

How does it detect a 'failure'? (Score:5, Informative)

Re:Instantly? (Score:5, Informative)

Don't ask, just look. (Score:5, Informative)

It works as advertized (Score:5, Informative)

Re:Xen did it first (Score:1, Informative)

Re:Xen did it first (Score:3, Informative)

Re:FT (Score:2, Informative)

Re:Xen did it first (Score:2, Informative)

Re:Instantly? (Score:3, Informative)

Re:Instantly? (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot