Remus Project Brings Transparent High Availability To Xen 137
An anonymous reader writes "The Remus project has just been incorporated into the Xen hypervisor. Developed at the University of British Columbia, Remus provides a thin layer that continuously replicates a running virtual machine onto a second physical host. Remus requires no modifications to the OS or applications within the protected VM: on failure, Remus activates the replica on the second host, and the VM simply picks up where the original system died. Open TCP connections remain intact, and applications continue to run unaware of the failure. It's pretty fun to yank the plug out on your web server and see everything continue to tick along. This sort of HA has traditionally required either really expensive hardware, or very complex and invasive modifications to applications and OSes."
Intact? (Score:5, Informative)
Re:Himalaya (Score:3, Informative)
I was just thinking that...
Tandems may still have other advantages, though; back in the day, we built a database on Himalayas/NSK because, availability aside, it outperformed Sybase, Oracle, and other solutions. (They implemented SQL down at the drive controller level; it was ridiculously efficient.) No idea if that's still the case.
But Tandem required you to build their availability hooks into your app; it wasn't transparent. OTOH, Stratus's approach is;a Stratus server is like having RAID-1 for every component of your server. I gotta think this will cut into their business.
Answer (Score:5, Informative)
I've worked with Remus, so I can answer your question.
It's not "constantly going" into live migration. The backup image is constantly kept in a "paused" state. It doesn't come out of the paused state until communication with the original is broken.
Until the backup goes live, the shadow pages for memory are updated, via checkpoints. The checkpointing interval is somewhat variable, but it's actually hardcoded into the Xen software (at present - this will change), regardless of what the user level utility tells you.
As it is, the subsecond checking doesn't work too well. But intervals of about 1-2 seconds works great. Getting subsecond checkpointing can be done (I've done it), but you need extra code than what Remus currently provides.
Similar comments are applicable to the storage updating. This works absolutely superbly if you're using something like DRBD for the storage replication.
Remus is pretty cool technology, and it serves as a very solid foundation for taking things to the next level.
The folks at UBC have done a superb job here, and should be well congratulated.
Re:It's pretty fun (Score:5, Informative)
Uuum... session management? Transaction management? The server dying in the process of something that costs money?
Even if it's something as simple as losing the contents of your shopping cart just before you wanted to buy, and then becoming angry at the stupid ass retarded admins and developers of that site.
Or losing the server connection in your flash game, right before saving the highscore of the year.
Webservers are far less stateless than you might think. Nowadays they practically are app servers. (Disclosure: I did web applications since 2000, so I know a bit about the subject.)
When 5 minutes downtime mean over a hundred complaints in your inbox and tens of thousands of dropped connections, which your boss does not find funny at all, you don't do that error again.
Re:How does it deal with replication latency? (Score:5, Informative)
Re:state transfer (Score:3, Informative)
Re:Himalaya (Score:1, Informative)
The IO bottleneck in this case is the interconnect between the two machines, not disk, so the SAN isn't relevant. VMware FT needs at least a dedicated GbE NIC for replay/lockstep traffic, I think the recommendation is 10Gb, and is still limited to using a single vCPU in the VM.
Re:How does it deal with replication latency? (Score:5, Informative)
Re:Intact? (Score:1, Informative)
Infact, you're right!
Re:Himalaya (Score:3, Informative)
Actually, after reading the paper, this is no threat to Stratus or other players in the space like Marathon or VMWare's FT. The performance impact is pretty significant - by their own benchmarks there was a 50% perf hit in a kernel compile test, and 75% in a web server benchmark.
This is an interesting approach and seems to handle multiple vCPU's in the VM which I haven't seen done by the software approaches like Marathon and VMware FT, but I think it will mainly be used in applications that would have never been considered for a more expensive solution anyway.
Re:state transfer (Score:5, Informative)
Re:Wrong place to put a failsafe? (Score:4, Informative)