Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Software Technology

Remus Project Brings Transparent High Availability To Xen 137

An anonymous reader writes "The Remus project has just been incorporated into the Xen hypervisor. Developed at the University of British Columbia, Remus provides a thin layer that continuously replicates a running virtual machine onto a second physical host. Remus requires no modifications to the OS or applications within the protected VM: on failure, Remus activates the replica on the second host, and the VM simply picks up where the original system died. Open TCP connections remain intact, and applications continue to run unaware of the failure. It's pretty fun to yank the plug out on your web server and see everything continue to tick along. This sort of HA has traditionally required either really expensive hardware, or very complex and invasive modifications to applications and OSes."
This discussion has been archived. No new comments can be posted.

Remus Project Brings Transparent High Availability To Xen

Comments Filter:
  • Intact? (Score:5, Informative)

    by Glock27 ( 446276 ) on Wednesday November 11, 2009 @07:00PM (#30067078)
    Intact is one word, O ye editors...
  • Re:Himalaya (Score:3, Informative)

    by Jay L ( 74152 ) * <jay+slash&jay,fm> on Wednesday November 11, 2009 @07:13PM (#30067194) Homepage

    I was just thinking that...

    Tandems may still have other advantages, though; back in the day, we built a database on Himalayas/NSK because, availability aside, it outperformed Sybase, Oracle, and other solutions. (They implemented SQL down at the drive controller level; it was ridiculously efficient.) No idea if that's still the case.

    But Tandem required you to build their availability hooks into your app; it wasn't transparent. OTOH, Stratus's approach is;a Stratus server is like having RAID-1 for every component of your server. I gotta think this will cut into their business.

  • Answer (Score:5, Informative)

    by Anonymous Coward on Wednesday November 11, 2009 @07:26PM (#30067310)

    I've worked with Remus, so I can answer your question.

    It's not "constantly going" into live migration. The backup image is constantly kept in a "paused" state. It doesn't come out of the paused state until communication with the original is broken.

    Until the backup goes live, the shadow pages for memory are updated, via checkpoints. The checkpointing interval is somewhat variable, but it's actually hardcoded into the Xen software (at present - this will change), regardless of what the user level utility tells you.

    As it is, the subsecond checking doesn't work too well. But intervals of about 1-2 seconds works great. Getting subsecond checkpointing can be done (I've done it), but you need extra code than what Remus currently provides.

    Similar comments are applicable to the storage updating. This works absolutely superbly if you're using something like DRBD for the storage replication.

    Remus is pretty cool technology, and it serves as a very solid foundation for taking things to the next level.

    The folks at UBC have done a superb job here, and should be well congratulated.

  • Re:It's pretty fun (Score:5, Informative)

    by Hurricane78 ( 562437 ) <deleted @ s l a s h dot.org> on Wednesday November 11, 2009 @07:28PM (#30067328)

    Uuum... session management? Transaction management? The server dying in the process of something that costs money?
    Even if it's something as simple as losing the contents of your shopping cart just before you wanted to buy, and then becoming angry at the stupid ass retarded admins and developers of that site.
    Or losing the server connection in your flash game, right before saving the highscore of the year.

    Webservers are far less stateless than you might think. Nowadays they practically are app servers. (Disclosure: I did web applications since 2000, so I know a bit about the subject.)

    When 5 minutes downtime mean over a hundred complaints in your inbox and tens of thousands of dropped connections, which your boss does not find funny at all, you don't do that error again.

  • by bcully ( 1676724 ) on Wednesday November 11, 2009 @07:41PM (#30067480)
    Hello slashdot, I'm the guy that wrote Remus. It's my first time being slashdotted, and it's pretty exciting! To answer your question, Remus buffers outbound network packets until the backup has been synchronized up to the point in time where those packets were generated. So if you checkpoint every 50ms, you'll see an average additional latency of 25ms on the line, but the backup _will_ always be up to date from the point of view of the outside world.
  • Re:state transfer (Score:3, Informative)

    by bcully ( 1676724 ) on Wednesday November 11, 2009 @07:48PM (#30067516)
    FWIW, we have an ongoing project to extend this to disaster recovery. We're running the primary at UBC and a backup a few hundred KM away, and the additional latency is not terribly noticeable. Failover requires a few BGP tricks, which makes it a bit less transparent, but still probably practical for something like a hosting provider or smallish company.
  • Re:Himalaya (Score:1, Informative)

    by Anonymous Coward on Wednesday November 11, 2009 @07:50PM (#30067532)

    The IO bottleneck in this case is the interconnect between the two machines, not disk, so the SAN isn't relevant. VMware FT needs at least a dedicated GbE NIC for replay/lockstep traffic, I think the recommendation is 10Gb, and is still limited to using a single vCPU in the VM.

  • by bcully ( 1676724 ) on Wednesday November 11, 2009 @07:56PM (#30067582)
    The buffering I mentioned above means that packet X will not escape the machine until the checkpoint that produced X has been committed to the backup. So when it recovers on the backup, X will already be in the OS send buffer. There's no possibility for misprediction. If the buffer is lost, TCP will handle recovering the packet.
  • Re:Intact? (Score:1, Informative)

    by Anonymous Coward on Wednesday November 11, 2009 @08:03PM (#30067664)

    Infact, you're right!

  • Re:Himalaya (Score:3, Informative)

    by Cheaty ( 873688 ) on Wednesday November 11, 2009 @08:04PM (#30067672)

    Actually, after reading the paper, this is no threat to Stratus or other players in the space like Marathon or VMWare's FT. The performance impact is pretty significant - by their own benchmarks there was a 50% perf hit in a kernel compile test, and 75% in a web server benchmark.

    This is an interesting approach and seems to handle multiple vCPU's in the VM which I haven't seen done by the software approaches like Marathon and VMware FT, but I think it will mainly be used in applications that would have never been considered for a more expensive solution anyway.

  • Re:state transfer (Score:5, Informative)

    by bcully ( 1676724 ) on Wednesday November 11, 2009 @08:23PM (#30067844)
    It depends pretty heavily on your workload. Basically, the amount of bandwidth you need is proportional to the number of different memory addresses your application wrote to since the last checkpoint. Reads are free -- only changed memory needs to be copied. Also, if you keep writing to the same address over and over, you only have to send the last write before a checkpoint, so you can actually write to memory at a rate which is much higher than the amount of bandwidth required. We have some nice graphs in the paper, but for example, IIRC, a kernel compilation checkpointed every 100ms burned somewhere between 50 and 100 megabits. By the way, there's plenty of room to shrink this through compression and other fairly straightforward techniques, which we're prototyping.
  • by bcully ( 1676724 ) on Wednesday November 11, 2009 @09:26PM (#30068270)
    Split brain is a possibility, if the link between the primary and backup dies. Remus replicates the disks rather than requiring shared storage, which provides some protection over the data. But there are already a number of protocols for managing which replica is active (e.g., "shoot-the-other-node-in-the-head") -- we're worried about maintaining the replica, but happy to use something like linux-HA to control the actual failover.

Today is a good day for information-gathering. Read someone else's mail file.

Working...