Remus Project Brings Transparent High Availability To Xen 137
An anonymous reader writes "The Remus project has just been incorporated into the Xen hypervisor. Developed at the University of British Columbia, Remus provides a thin layer that continuously replicates a running virtual machine onto a second physical host. Remus requires no modifications to the OS or applications within the protected VM: on failure, Remus activates the replica on the second host, and the VM simply picks up where the original system died. Open TCP connections remain intact, and applications continue to run unaware of the failure. It's pretty fun to yank the plug out on your web server and see everything continue to tick along. This sort of HA has traditionally required either really expensive hardware, or very complex and invasive modifications to applications and OSes."
Already done by VMware (Score:5, Interesting)
Himalaya (Score:3, Interesting)
How does this compare to a "big iron" solution like Tandem/Himalaya/NonStop/whatever-it's-called-nowadays.
Re:Himalaya (Score:5, Interesting)
By the time you get all the components that provide the processing and I/O throughput of those high-end boxes, the x86/64 commodity hardware cost advantage has evaporated.
How does it deal with replication latency? (Score:3, Interesting)
I'm pretty sure that if I just yank the cable, not everything will be replicated. :-)
Re:state transfer (Score:4, Interesting)
If your primary and secondary systems are physically located next to each other then they aren't in the category of highly available. Furthermore with storage replication and regular snapshotting you can have your virtual infrastructure at your DR site on the cheap while gaining enterprise availability and most importantly, business continuity.
I'll agree with being skeptical about transparency although how many people already have this? I went with XenServer and Citrix Essentials for it, I already have this fail-over and I can tell you that it works. I physically pulled a blade out of the chassis and sure enough, by the time I got back to my desk the servers were functioning having dropped a whole packet. Further tweaking of the underlying network infrastructure resulted in keeping the packet with just a momentary rise in latency.
Enterprise availability is fast coming to the little guys.
Re:Already done by VMware (Score:4, Interesting)
Re:How does it deal with replication latency? (Score:3, Interesting)
No it won't.
VMWare claims the same crap and its simply not true.
You have a 50ms window between checkpoints that can be lost, in your example . The only way to ensure no lost is to ensure that every change, every instruction, every microcode executed in the CPU on machine A is duplicated on B before A continues to the next one. You simply can't do that without specialized hardware since you don't even have access to the microcode as its executed on standard hardware.
50ms on my hardware/software can mean thousands of transactions lost. That can wreak havoc on certain network protocols and cause database operations to fail completely as you replay portions of transactions that the database has already seen.
I can come up with situations all day long as to how this isn't as seamless as you make it out to be. Sure, xclock transitions to the other machine in what appears to be a perfect no loss transition, or solitaire on a windows machine, but thats not exactly useful.
Remus has plenty of uses, but it has plenty of pitfalls and regardless of claims does require consideration when developing systems unless you're introducing latency that to me, would just be completely unacceptable and would require applications to be aware of the latency. Hell, thats 6.25MB of data that can be transmitted over a gigabit pipe between checkpoints. That can kill performance.
I know what you're saying, I know what you mean, and I just don't think you realize how much that latency can effect certain classes of applications.
Re:Wrong place to put a failsafe? (Score:5, Interesting)
This is something that the much simpler Linux-HA environment deals with by using something they call STONITH, which basically means to Shoot The Other Node In The Head. STONITH peripherals are devices that can completely shut down a server physically, e.g. a power strip that can be controlled via a serial port. If you wind up with a partitioned cluster, which they more colorfully call a 'split brain' condition, where each node thinks the other one is dead, each of them uses the STONITH device to make sure, if it is able. One of them will activate the STONITH device before the other, and the one which wins keeps on running, while the one that loses really kicks the bucket if it isn't fully dead. I imagine that Remus must have similar mechanisms to guard against split brain conditions as well. I've had several Linux-HA clusters go split brain on me, and I tell you it's never pretty. The best case is that they only both try to grab the same IP address and get an IP address conflict, in the worst case, they both try to mount and write to the same fiberchannel disk at the same time and bollix the file system. If a Remus-based cluster split brains, I can imagine that you'll get mayhem just as awful unless you have a STONITH-like system to prevent it from happening.
Re:state transfer (Score:3, Interesting)
"If your primary and secondary systems are physically located next to each other then they aren't in the category of highly available."
High availability covers more than just distributed data centers. Load-balancing, fail-over, clustering, mirroring, reduntant switches, routers, and other hardware: all are zero-point-of-failure, high availability solutions.
Re:Himalaya (Score:3, Interesting)