Catch up on stories from the past week (and beyond) at the Slashdot story archive

Clustering vs. Fault-Tolerant Servers 321

Posted by ScuttleMonkey on Monday October 03, 2005 @02:01PM from the i-don't-want-my-server-to-tolerate-faults dept.

mstansberry writes "According to SearchDataCenter.com fault-tolerant server vendors say the majority of hardware and software makers have pushed clustering as a high-availability option because it sells more hardware and software licenses. Fault-tolerant servers pack redundant components such as power supply and storage into a single box, while clustering involves the networking of multiple, standard servers used as failover machines." Perhaps some readers on the front lines can shed a bit more light on the debate based on both proprietary and Linux-based approaches.

This discussion has been archived. No new comments can be posted.

Clustering vs. Fault-Tolerant Servers

Load All Comments

Search 321 Comments Log In/Create an Account

Comments Filter:

It depends on what you want to do. (Score:3, Interesting)

by ResQuad ( 243184 ) * writes: <<slashdot> <at> <konsoletek.com>> on Monday October 03, 2005 @02:02PM (#13706488) Homepage

Personally I opt for clustering over fualt-tollerance - but thats my personal choice. It really depends on what the machine(s) will be doing. If you have a database server - fault tollerence (because I have yet to meet a clustering DB solution that didnt suck). But if your building a webserver - cluster.

Also the one thing the article mentions is that clustering is just as expensive as fault-tollerence due to software licesing. Last I checked if its one copy of Debian + Apache + MySQL + Perl or 200 copies - its going to cost me the same price (free). And windows doesnt support clustering yet - in any decent way shape or form - so I dont see the problem here.

Share
twitter facebook
- Re:It depends on what you want to do. (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  Heh. In order to do it completely right, you'd make a cluster out of fault tollerant nodes :-P
- Re:It depends on what you want to do. (Score:3, Informative)
  
  by TheRealMindChild ( 743925 ) writes:
  
  What is this then:
  
  http://www.microsoft.com/windowsserver2003/technol ogies/clustering/default.mspx [microsoft.com]
  
  Clustering (NOT performance clustering mind you, which is NOT the topic at hand anyway) has been around in Windows NT as far back as I can remember. With NT4, you needed to have Enterprise Edition, but it was there.
  - Re:It depends on what you want to do. (Score:5, Interesting)
    
    by crimethinker ( 721591 ) writes: on Monday October 03, 2005 @02:14PM (#13706617)
    
    Quoth the GP: "in any decent way shape or form"
    Yes, Windows has supported clustering since NT4 (Wolfpack), and per the GP, it SUCKED BOLLOCKS. I had to deal with that shite every damn day for almost 3 years (1997-2000). We used active-active failover, and the joke around the company was that MS were halfway there: the "fail" worked just fine.
    -paul
    
    Parent Share
    twitter facebook
    - Re:It depends on what you want to do. (Score:2)
      
      by TheRealMindChild ( 743925 ) writes:
      
      To be honest, I would have never brought it up if I had such issues. It was one of those things that ALMOST always worked great for me... and any collegues that I spoke with that delt with the same.
      
      It is possible, though, that I am in the minority.
    - Re:It depends on what you want to do. (Score:5, Insightful)
      
      by Donny Smith ( 567043 ) writes: on Monday October 03, 2005 @02:28PM (#13706731)
      
      Well, in case you haven't noticed, it's late 2005 now.
      Some things have changed, for example Windows 2003 Server came out and MSCS is now quite a decent HA solution.
      
      (BTW, the grandparent post didn't say that Microsoft's own clustering solution was lame, he made a general statement about all clustering software for the Windows platform).
      
      Parent Share
      twitter facebook
  - Re:It depends on what you want to do. (Score:3, Informative)
    
    by pete-classic ( 75983 ) writes:
    
    I worked in Dell server support from summer of '98 to summer 2000. I supported NT 4 HA clustering and I have to tell you, it was an unqualified nightmare.
    
    Since I was in support I didn't see a cross-section, I only saw the failures. That said, there were a LOT of installations out there that would have had better availability with a beige box, and MUCH better availability with a single fault-tolerant server.
    
    It didn't help that sales constantly sold invalid configurations and set unreasonable expectations.
    
    B
- Re:It depends on what you want to do. (Score:5, Insightful)
  
  by Tenareth ( 17013 ) writes: on Monday October 03, 2005 @02:09PM (#13706564) Homepage
  
  A Web farm is the simplest form of clustering, some would argue it isn't even a cluster because the nodes are not aware of each other. However, it gets more confusing when you add a Java layer that load balances...
  
  Anyway, I do agree that I've seen more trouble caused by DB Clustering solutions than it helps...
  
  A cluster adds complexity to the environment, Complexity == Cost, even without the expensive software.
  
  Parent Share
  twitter facebook
  - a Java layer that load balances??!!??!!?!? (Score:2, Insightful)
    
    by infonography ( 566403 ) writes:
    
    Not worth doing. The cluster components should be dumb. There isn't a valid reason to have them know about each other. Your Round Robin or whatever balance you want should come from outside. F5 makes a nice box for that, so do others, if your really a cheapskate and wanted to you could duplicate them. If you need to have anything know about who is on what machine let the system tell that to the backend DB machine. It should be a channel architecture, not a crazy tangle. The more you break the functions down
- Microsoft Windows Server DOES support clustering (Score:2)
  
  by MLopat ( 848735 ) writes:
  
  In fact, Microsoft Windows has supported clustering for quite some time. At least the better part of seven years as it was available on Windows NT Server 4.0.
  
  If you want to see the latest Microsoft offering on clustering services, check out this site http://www.microsoft.com/windowsserver2003/technol ogies/clustering/default.mspx [microsoft.com]
  - Re:Microsoft Windows Server DOES support clusterin (Score:2)
    
    by WindBourne ( 631190 ) writes:
    
    From a few that I have talked to and that have actually worked with this, they tell me that it is a nightmare and that they would switch to something like NCR's server, next time. Apparently, they felt for running MS clusters, that it was expensive and difficult and did not work well.
    
    Interestingly, one of them also runs a Linux and a HP cluster and say they were much easier and were moving their code base to Linux only.
- Re:It depends on what you want to do. (Score:5, Informative)
  
  by CSHARP123 ( 904951 ) writes: on Monday October 03, 2005 @02:13PM (#13706606)
  
  And windows doesnt support clustering yet
  Windows Server 2003 actually supports two different types of clustering. One is called network load balancing, which enables up to 32 clustered servers to run a high-demand application to prevent a single server from being bogged down. If one of the servers in the cluster fails, then the other servers instantly pick up the slack.
  Network load balancing has been most often used with Web servers, which tend to use fairly static code and require little data replication. If a clustered web site needs more performance than what the cluster is currently providing, additional servers can be instantaneously added to the cluster. Once the cluster reaches the 32-server limit, you can further expand the cluster by creating a second cluster and then using round-robin DNS to divide traffic between the two clusters.
  The other type of clustering that Windows Server 2003 supports by default is often referred to simply as clustering. The idea behind this type of clustering is that two or more servers share a common hard disk. All of the servers in the cluster run the same application and reference the same data on the same disk. Only one of the servers actually does the work. The other servers constantly check to make sure that the primary server is online. If the primary server does not respond, then the secondary server takes over.
  This type of clustering doesn't really give you any kind of performance gain. Instead, it gives you fault tolerance and enables you to perform rolling upgrades. (A server can be taken offline for upgrade without disrupting users.) In Windows 2000 Advanced Server, only two servers could be clustered together in this way (four servers in Windows 2000 Datacenter Edition). In Windows Server 2003, though, the limit has been raised to eight servers. Microsoft offers this as a solution to long-distance fault tolerance when used in conjunction with the iSCSI protocol (SCSI over IP).
  
  Parent Share
  twitter facebook
  - Re:It depends on what you want to do. (Score:2)
    
    by killjoe ( 766577 ) writes:
    
    Where is the shared nothing clustering?
- Ignoramus (Score:5, Informative)
  
  by Donny Smith ( 567043 ) writes: on Monday October 03, 2005 @02:21PM (#13706682)
  
  What you wrote is really ignorant (which, modded on /., translates to Insightful).
  
  1. (because I have yet to meet a clustering DB solution that didnt suck).
  
  Where do you live? In Ruanda?
  Perhaps you have heard of Oracle RAC. And there are other very good clustering solutions for DBMS.
  
  2. one copy of Debian + Apache + MySQL + Perl or 200 copies
  
  mySQL isn't enterprise-reliable even in stand-alone configuration, let alone clustering. I can't believe this...
  
  3. And windows doesnt support clustering yet - in any decent way shape or form, I dont see the problem here.
  
  Hah, hah! Enough said.
  And also - what's it to you? If Microsoft (in your view) had a good clustering solution, you'd lose sleep over that?
  When you're biased like that, no wonder you can't have a quality, unbiased opinion on this topic.
  
  Parent Share
  twitter facebook
  - Re:Ignoramus (Score:2)
    
    by rasjani ( 97395 ) writes:
    
    Check out M/Cluster software from Emicnetworks for mysql h/a clustering..
- Re:It depends on what you want to do. (Score:5, Informative)
  
  by Jim Hall ( 2985 ) writes: on Monday October 03, 2005 @02:26PM (#13706714) Homepage
  
  Let me preface this by saying I'm the Enterprise IT Manager for a large, Big-10 University. "Enterprise" means I am responsible for all servers that run the University, not just a small department. My userbase is 70,000+ students, and somewhere between 15,000-20,000 faculty and staff.
  
  We run a variety of hardware platforms, including a large Linux deployment. Yes, it really does depend on what you want to do with that server, before you can decide to go with a bunch of servers behind a load balancer v. a larger, fault-tolerant server.
  
  For our production web servers (PeopleSoft, web registration, etc.) we run a bunch of cheap servers running Red Hat Enterprise Linux, and we distribute them across two data centers (for redundancy.) We run a load balancer in front of them, so that users access one URL, and the load balancer automagically distributes traffic to the servers on both data centers. For a lightly-used application, we may only run 2 web servers. For heavily-used applications (web registration) we run 5 web servers. Those are IBM x-series now, but we are in the process of moving to IBM BladeCenters.
  
  With multiple servers in production, I can lose any single web server and not experience downtime on the application. We usually only have a single PSU in each server, because there's no point in the extra expense when we have redundancy at the server level. And because we've split our web servers across two data centers, I can actually lose an entire data center and only experience slow response time on the application. (Note to the paranoid: while the data centers are only 1.4miles apart, they are on separate power grids, etc. The other back-end infrastructure is also split between data centers.) We run a lot of sites behind load balancers, so we can afford to have a separate load balancer pair at each site (which can provide backup to each other.)
  
  However, for large applications we may use a single fault-tolerant Linux server. For example, we used to do this with a database server. Multiple power supplies, multiple network connections, RAID storage, etc. To be honest, though, we tend to run databases on "big iron" hardware such as Sun SPARC (E25000, V890, etc.) and IBM p-series. We don't have any Linux database servers left, but that's not because Linux wasn't up to the task (our DBAs preferred to have the same platform for all databases, to make debugging and knowledge-sharing easier.)
  
  In a few cases, we have a third tier. If the application is low-priority (i.e. a development server) and/or low-volume (i.e. a web site that doesn't get much traffic), we run a single server for that. The server is a cheap IBM x-series box running Red Hat Enterprise Linux, usually with no built-in redundancy.
  
  Yes, for us Linux has been able to play along quite nicely with the "big iron" UNIX systems. We've run Linux at the Enterprise level since 1998 or 1999, and Linux is definitely considered part of our Enterprise solution.
  
  Parent Share
  twitter facebook
- Re:It depends on what you want to do. (Score:5, Insightful)
  
  by Marillion ( 33728 ) writes: <ericbardes@PARISgmail.com minus city> on Monday October 03, 2005 @02:38PM (#13706815)
  
  That makes lots of sense. Software costs do multiply in clustering. Zero times 100 is still zero. But, clustering has other headaches beyond money.
  The usual clustering I've seen is "Hot Spare" clustering. The primary runs until it goes kaput, then the second takes over. For database clustering, the two boxes usually share the same disks. I think I've seen more outages from false takeovers by the seconday than real failures of the primary.
  The other problem with clustering is that all of your software applications have to be cluster tolerant. If the user app keeps a database connection open and a rollover occurs, the connection state doesn't and can't rollover with it. To a client system, a cluster failover looks like a server reboot. Don't underestimate the difficulty of this problem. A new application has to be designed with that in mind. Retro-fitting it in later is hard - and costly, even with free platforms.
  Another issue that can't be solved with clustering is application failure or application limits. You may recall the airline system failure last Christmas? Some 80% of Slashdot readers asked where was the backup? (there was) should have used Unix (they were). The box (RS6000) and operating system (AIX) kept running just fine. A hundred computer cluster couldn't solve the the real problem: the application couldn't handle the volume of information it was required to hold and they at the mercy of a proprietary source code vendor.
  
  Parent Share
  twitter facebook
- More clustering benefits (Score:2, Insightful)
  
  by Anonymous Coward writes:
  Clustering protects against many more types of failures than servers with internally redundant hardware.
  
  Clustering protects allows easy zero-downtime upgrades (update half the cluster, and then the second half)
  
  Clustering allows easy zero-downtime moves from one data-center to another (move half the servers; and then the second half)
  
  Clustering protects against more types of user errors than internally redundant servers (oops, I turned off the wrong machine)
- Re:It depends on what you want to do. (Score:3, Insightful)
  
  by Stripe7 ( 571267 ) writes:
  
  The choice between fault tolerants systems is decided on the interval your company can sustain an outage. A cluster can take 1-2 min to move applications from a dead node to another working one. If you applications require sustained 100% connectivity you need to go fault tolerant. Usually its for Realtime monitoring software like the computers used to monitor telephone exchanges. For databases and NFS services clusters work better as you can take a 1-2 minute hit in the response when a node fails. Software
- Re:It depends on what you want to do. (Score:2, Interesting)
  
  by donaldm ( 919619 ) writes:
  
  Most clusters are equivalent to DEC-safe (you can even get the source code on Freshmeat) which is mainly a group of machines joined together via a SCSI interconnect or a Storage Area Network and a common lan. all interconnects should be redundant and that includes the network. The only cluster that is different is the Tru64 cluster which has a clustered file-system. I think Redhat clustering uses NFS (anyone advise on this) but you need a very fast network if you want disk performance.
  
  Fault tolerant is the
- Re:It depends on what you want to do. (Score:3, Informative)
  
  by Mateito ( 746185 ) writes:
  
  This is not a case of "which is better", but a "what is right for what I want to do".
  There are "Best Practises" for doing this sort of thing that take the religion out of server-farm design.
  
  First thing to work out:
  (1) How many minutes of APPLICATION downtime are acceptable
  (2) How much money will I lose for each miunte the application is down.
  
  Multiply (1) by (2), and you have a rough idea of your budget. Ideally, this should be the last thing - you work out your needs and then pay for them - and that was tru
I don't see why anybody would use their own server (Score:2, Funny)

by jackcarter ( 884148 ) writes:

I just use Geocities, it's free and easy!
- Re:I don't see why anybody would use their own ser (Score:2)
  
  by zmokhtar ( 539671 ) writes:
  
  Yes, but what does geocities use?
Oh the irony (Score:2, Funny)

by Anonymous Coward writes:

It's slashdotted already.
I know where this is going (Score:2, Funny)

by Anonymous Coward writes:

...and i am just waiting on the call from our vendor recommending we upgrade to a cluster of fault-tolerant servers.
Software vendors (Score:5, Insightful)

by PCM2 ( 4486 ) writes: on Monday October 03, 2005 @02:05PM (#13706515) Homepage

So if you ask a software vendor whether it's better to buy expensive hardware or to save money on hardware and install more copies of software, what's he going to say? Even if you had a site license he'd still say that, because guess what ... he's a software vendor. He's not in the business of solving your problems with hardware.

Share
twitter facebook
Fault tolerant hardware is not the solution (Score:2, Insightful)

by Anonymous Coward writes:

Hardware fails... it's as simple as that. You should plan on that for one reason or another you will have to shutdown and replace hardware. If it can be done with minimal or no disruption to the services, then that's all the better. OS makes licencing no longer a problem.
- Re:Fault tolerant hardware is not the solution (Score:3, Informative)
  
  by TinyManCan ( 580322 ) writes:
  
  Unfortunately, for many reasons, Open Source does not end the cost of licensing for many organizations. Most of the good clustering solutions that I have seen recently involve breaking every application and service into a 'package' that can run on many different physical servers. Each package has a virtual IP address associated with it.
  
  When hardware fails, you bring up the required packages on a different physical host, and other applications access it using the virtual IP. Going this route allows you to do
So the choice is between... (Score:3, Funny)

by Anonymous Coward writes: on Monday October 03, 2005 @02:05PM (#13706520)

tolerating a lot of faults in one girlfriend or get a cluster of them and deal only with the good points?

Share
twitter facebook
- And the moral of the story is..... (Score:2)
  
  by WindBourne ( 631190 ) writes:
  
  Ones too many, and 100 is not enough?
- Re:So the choice is between... (Score:5, Funny)
  
  by mickwd ( 196449 ) writes: on Monday October 03, 2005 @03:10PM (#13707066)
  
  It depends how often they go down.
  
  Parent Share
  twitter facebook
  - Re:So the choice is between... (Score:4, Informative)
    
    by ScuzzMonkey ( 208981 ) writes: on Monday October 03, 2005 @05:31PM (#13708061) Homepage
    
    Mods! Wake up! How is this not +5 (either Funny or Insightful, I haven't decided which yet) already?
    
    Parent Share
    twitter facebook
- Re:So the choice is between... (Score:2)
  
  by TheRaven64 ( 641858 ) writes:
  
  If by `girlfriend,' you mean `computer' (which is probably a valid assumption here), then yes.
Not the same. (Score:5, Informative)

by tekn0lust ( 725750 ) * writes: on Monday October 03, 2005 @02:06PM (#13706524) Homepage

Clustering provides you with Fault Tollerant OS/Applications. A single server with tons of redundant bits, doesn't help you if the OS or Applications that it servers get borked.

Share
twitter facebook
- Absolutely right (Score:4, Informative)
  
  by TTK Ciar ( 698795 ) writes: on Monday October 03, 2005 @03:01PM (#13706992) Homepage Journal
  
  Clustering provides you with Fault Tollerant OS/Applications. A single server with tons of redundant bits, doesn't help you if the OS or Applications that it servers get borked.
  
  This is dead-on correct. For example, if a CGI hits a problematic state where it eats a lot of memory putting the server into a state where it's swapping, then it takes longer to service each http transaction, which means each more httpd transactions queue up, which means more memory gets allocated which means more swapping .. rendering the machine useless for a little while (until a sysadmin or a bot notices the state and either restarts the httpd or kills a few select processes). If we were running this on one mammoth server with lots of redundant bits, then 100% of our web service capacity would be down in the interim. But since we run a pool of ten http servers under keepalived/IPVS, we only lose 10% of our capacity during that time.
  
  Other reasons I've traditionally preferred clustering: easy to incrementally scale up infrastructure (no big buy-in in the beginning to get the server which can be expanded), fully parallel resources (an independent memory bus, an independent IO bus, two independent CPU's, an independent network card, and a few independent disks for each server, as opposed to a mammoth shared bus on a leviathan crossbar, which will inevitably run into contention), and more flexibility in how resources are divided amongst mutually exclusive tasks.
  
  One of those reasons is getting less relevant -- point-to-point bus technologies like LightningTransport and PCI-Express are inexpensively replacing the "one big shared bus" with a lot of independent busses, transforming the server into a little cluster-in-a-box. It is a positive change IMO, and shifts the optimal setup away from the huge cluster of relatively small machines, and towards a more moderately-sized cluster of more medium-sized cluster-in-a-box machines.
  
  The price of licenses is, IME, rarely an issue (in my admittedly limited career -- I don't doubt that it's relevant to many companies) because the places I've worked for have tended to use primarily free-as-in-beer (and often free-as-in-speech) open source solutions. What is more of an issue, IME, is the necessity of staffing yourself with cluster-savvy sysadmins and software engineers. Those of that ilk tend to be a bit rare and expensive, and difficult to keep track of. It takes a distributed systems professional to look at a distributed system and understand what is being seen, and this makes it easy to bend the spec or juggle the schedule on the sly, or run skunkworks projects outright. By contrast, the insanely redundant, mondo-expensive uberserver was created and programmed by very smart hardware and software specialists so that your IT staff doesn't need to be so specialized. This makes useful talent easier to acquire, and understanding the system closer to the reach of mere mortals.
  
  Just my two cents
  -- TTK
  
  Parent Share
  twitter facebook
Since information wants to be free (Score:5, Funny)

by Shadow Wrought ( 586631 ) writes: <shadow.wrought@g[ ]l.com ['mai' in gap]> on Monday October 03, 2005 @02:06PM (#13706525) Homepage Journal

Shouldn't we be encouraging server failures which enable their freedom from magnetic imprisonment? Kinda like PETA freeing lab animals...

Share
twitter facebook
More about the cost of hardware? (Score:5, Interesting)

by Sv-Manowar ( 772313 ) writes: on Monday October 03, 2005 @02:06PM (#13706526) Homepage Journal

Because of the open source stack behind a lot of server platforms these days, I'm dubious that this decision boils down simply to a software cost issue. One major benefit of using clustering is that many white box, non specialized machines can be used, which are easier & cheapter to replace or obtain components for. Complex and specialized hardware with built in redundancy is often expensive and can require vendor support contracts for effective maintainance.

Share
twitter facebook
- Real world example and cost (Score:3, Interesting)
  
  by MarkEst1973 ( 769601 ) writes:
  
  A gov't contractor I worked for was getting a contract to consolidate multiple servers and apps into a single pair of servers (web and db) for a small gov't agency.
  The agency bought a pair of dual proc Dells with lots of RAM and a full software stack (Windows Server, SQLServer, and ColdFusion Server). Total cost: ~$57,000.
  That's right, nearly 60k.
  Now, I've read that Google buys their white boxes at $1k each for their server farm. And I couldn't help but think what they'd (or I) would do with 57 box
Clustering (Score:3, Insightful)

by FnH ( 137981 ) writes: on Monday October 03, 2005 @02:06PM (#13706528)

Clustering provides a backup for software failures, that fault-tolerant servers don't. Also, upgrades without downtime are easier done with a load-balanced cluster.

Share
twitter facebook
Apples and Oranges (Score:4, Funny)

by Steven_M_Campbell ( 409802 ) writes: on Monday October 03, 2005 @02:06PM (#13706529)

If you are just talking about fault tolerance (FT) then spill a drink on the FT server then spill a drink on a clustered server and see the difference :) If we are not limited to fault tolerance than try load balancing an FT server with.. um..er... itself. This is really apples and oranges. BTW, I like FT servers in a cluster!

Share
twitter facebook
Why are clusters better? (Score:3, Interesting)

by darkmeridian ( 119044 ) writes: <william.chuang@nOspAm.gmail.com> on Monday October 03, 2005 @02:06PM (#13706531) Homepage

The article seems to make the choice one-sided. Fault tolerant servers have higher uptimes because the backup takes over immediately. Clusters have a single point of failure in the middleware. They argue that the clusters can run different operating systems, but that means more patches and updates to keep track of. Clusters are expensive because they need more OS and software licenses and require a lot of maintenance, though that might drop if they are running Linux or FreeBSD.

Anyone make a case for clusters for high-uptime situations?

Share
twitter facebook
- Catrastrophic loss (Score:3, Insightful)
  
  by lilmouse ( 310335 ) writes:
  
  Anyone make a case for clusters for high-uptime situations?
  Well, if your whole rackspace burns to the ground, that's a bit much for a "fault tolerant" server to handle. Mutliple sites mean a single nuclear weapon (plane hitting WTC, fire, hurricane, earthquake, you get the idea) can't take you down.
  
  --LWM
You shouild use both (Score:3, Insightful)

by Barondude ( 245739 ) writes: on Monday October 03, 2005 @02:08PM (#13706545)

If HA is what you are really after, you should use both. You want a fault tolerant server so you never have to go down unexpectedly and you want a fail over node so if the unexpected occurs, you'll be back up in a jiffy.

Share
twitter facebook
Clustering is safer (Score:2, Interesting)

by arcadum ( 528303 ) writes:

If you buy one machine, you still may need to power it off to open the case, or replace a part.
- Re:Clustering is safer (Score:2)
  
  by magarity ( 164372 ) writes:
  
  you still may need to power it off to open the case, or replace a part.
  
  If you're willing to lay out the cash, you can get a server that will let you swap out bad cards, memory, and even CPUs while the thing is running without missing a beat.
- Re:Clustering is safer (Score:2)
  
  by Tenareth ( 17013 ) writes:
  
  I don't know, I've replaced almost every component on my production servers without taking them down... Of course, that's a $1Million server...
- Re:Clustering is safer (Score:2)
  
  by schon ( 31600 ) writes:
  
  if you buy one machine, you still may need to power it off to open the case, or replace a part.
  
  I think you don't quite understand the concept of "fault tolerant servers".
  
  The entire point of a fault-tolerant server is that you don't have to power it off to open the case or replace a part.
It all depends (Score:2, Insightful)

by Anonymous Coward writes:

Fault tolerant systems are all in one physical location.
Clusters can be in different server racks, building, city even country.

It depends what the goal is. Fault tolerance, scalability, disaster recovery, etc.

They both have their uses, let's not discount one or the other, just use them properly.

**Typically, the goal is a mix of the ones I enumerated, hence I typically choose clusters. However, I always re-evaluate every time a new requirement comes in.
Clustering is really... (Score:2)

by Shads ( 4567 ) * writes:

... the better technology IF space isn't an issue.

If you've got the space for the extra servers clusters are great, if you don't have that kind of excess space then fault tolerance is top of the mark.
Not to be too snide... (Score:2)

by Bill the Cat ( 19523 ) writes:

...but my users and my bosses don't care much what searchdatacenter.com has to say about the situation, in the event hardware failure takes down a critical application.

If the people that pay me are willing to invest in the extra HW and SW to make a critical app available, then we do it.
Not either/or (Score:5, Interesting)

by Declarent ( 628681 ) writes: on Monday October 03, 2005 @02:13PM (#13706609)

I build AIX HACMP clusters for a living, and I'll tell you that you should *never* use an either/or approach, as TFA suggests. Nobody in their right mind is wondering if they should get a cluster OR FT hardware. They get a cluster of FT servers.

Maybe if they want to write an article, they should spend some time in the real world and see how the HA industry works instead of making up some arbitrary demarkation line to hang a preconception on.

Share
twitter facebook
- - Re:Not either/or (Score:2, Insightful)
    
    by Declarent ( 628681 ) writes:
    
    That's true, it's a massively distributed app. In every class of solution, there are extreme cases for which the rule does not apply. Those cases do not change how the average customer does business.
  - Re:Not either/or (Score:3, Interesting)
    
    by sapbasisnerd ( 729448 ) writes:
    
    What Google does barely deserves the label clustering.
    Actually that's not really fair, the problem is the term clustering has become overloaded. What Google does is would be more completely described as "shared nothing" distributed computing. They use cheap as chips iron beacuse nobody cares if a transaction fails, because no data is lost, the end user just pushes refresh. Similarily the various grid compute "clusters" (SETI, Folding@Home etc.) can recover from a lost unit of work by sending it out for rep
Flexibility (Score:2)

by div_2n ( 525075 ) writes:

A large and fully redundant fault tolerant server is more flexible. Use virtualization and have many reliable servers of many different operating systems in one unit as opposed to a highly specialized cluster.

For certain tasks, clustering will certainly offer a performance advantage from a scalability standpoint. Yet a fully fault tolerant hardware system like from Stratus [stratus.com] offers just a touch more reliability than a fault tolerant software system.
The Good, The Bad, and The Ugly (Score:3, Insightful)

by flinxmeister ( 601654 ) writes: on Monday October 03, 2005 @02:15PM (#13706630) Homepage

The Good: Using cheap components in a cluster to create scalability at a good value The Bad: Using a cluster to cover up coding issues, architectural crap, or instabilities in the system The Ugly: "the bad" gets so bad that it crashes the whole freakin' cluster. Why did we do this again?

Share
twitter facebook
Clustering Potentially Solves More Problems (Score:4, Informative)

by bradm ( 27075 ) writes: on Monday October 03, 2005 @02:16PM (#13706632)

Fault tolerance gets you a machine that keeps running in the face of hardware failures and maintenance. The switchover time is arguably negligible.

Clustering gets you a set of services that keep running in the face of hardware failures and maintenance. The switchover time can range from negligible to huge depending on the application involved.

However, clustering also helps you to solve other problems, including scaling, software failures, software upgrades, A-B testing (running different versions side by side), major hardware upgrades, and even data center relocations.

Clustering tends to require a lot more local knowledge to get right.

So if you narrow the problem definition to hardware only, they solve the same class of problems. But when you broaden it to the full range of what clustering offers you find a greater opportunity for cost savings - because one technique is covering multiple needs.

Share
twitter facebook
False dicotemy (Score:2)

by afidel ( 530433 ) writes:

If you are going to go so far as to pay for redundant everything hardware you probably want to buy at least a pair of them and put them in a cluster. I know very few places where the demands are such that they would buy a single super expensive server and NOT have a cluster to allow for things like software upgrades.
SneakerNet * (Score:5, Interesting)

by dada21 ( 163177 ) * writes: <adam.dada@gmail.com> on Monday October 03, 2005 @02:16PM (#13706645) Homepage Journal

In my 15 years of IT consulting, no network has provided data safety transparency cheaply or consistently enough. Clusters and fault tolerance both cost more than downtime in my experience.

We desperately need a better way to access data in a corporate network.

My favorite customers are those architects and engineers who avoid networking except for the Net. Seriously, sneakernet and peer-to-peer has shown the least downtime I've seen.

I think p2p networks will see a comeback if a torrent-like protocol can grow to be speedy. My customers are not banks, but they need 100% uptime as every day is a beat-the-deadline day.

If someone can extend and combine an internal torrent system with a decent file cataloging and searching system, they'll see huge money. I have some 150 user CAD networks just waiting for it.

What would a hive network need?

* Serverless
* Files hived to 3+ workstations
* Database object hiving
* File modification ability (save new file in hive, rename previous file as old version, delete really old versions after user configurable changes)
* "Wayback Machine" feature from old versions
* PCs disconnected from hive will self correct upon reconnection

It is very complex right now, but my bet is that the P2P network will trump client-server for the short run. The "client is the server" vs "the server is the client"?

Share
twitter facebook
- Re:SneakerNet * (Score:3, Insightful)
  
  by Ramses0 ( 63476 ) writes:
  
  What about iFolder [novell.com]? Looking at the spec's I think it's missing serverless/hiving (which could be provided by any of the normal p2p people), file history ... not understanding your database object comment.
  
  Speaking of which, what about freenet [sf.net]? The only thing it's missing is "guaranteed availability of critical business data", eh? And I hear it might have some performance problems. ;^)
  
  --Robert
  - Re:SneakerNet * (Score:2)
    
    by dada21 ( 163177 ) * writes:
    
    iFolder is so-1990's to me, heh. Freenet seems doomed!
    
    The war is on:
    
    A. huge megaservers online serving thin/dumb terminals over high speed network connections (renting processors and storage and even apps all on demand with backups)
    
    B. P2P with cheap clients and cheap shared in-client storage
    
    I don't know which way is better. High bandwidth will get cheaper and more available every day.
    
    For now, I'm betting on DumbClient/MonsterServer being the cheapest both initially and in the long run when 10Mb connection
- Re:SneakerNet * (Score:2)
  
  by MyHair ( 589485 ) writes:
  
  You might check out OpenAFS [openafs.org]. I'm not sure it meets all your requirements, though.
- Re:SneakerNet * (Score:2)
  
  by sfcat ( 872532 ) writes:
  
  What about AFS [cmu.edu] which stands for Andrew File System. It was developed at CMU and allows dynamic backup of data (it automagically copies you data to different physical volumes). I've never even heard about data being lost on an AFS system and it supports very high security too. Then just build your code on top of the UNIX commands or AFS file API. But then again, it might be a bit much for your requirements. I don't know of a windows client version but one might exist. And the wayback part you might hav
  - Re:SneakerNet * (Score:2)
    
    by Tiroth ( 95112 ) writes:
    
    There is a commercial implementation of Andrew called DFS (distributed filed system) and sold by IBM. It is mostly used by banks and universities AFAIK due to the mentioned strong integrity and security features.
    
    It IS possible to chuff things up, mainly by making administrative errors.
  - Re:SneakerNet * (Score:2)
    
    by dada21 ( 163177 ) * writes:
    
    I've played with it. It seems more of a backup bandaid than a realtime data hive like I'm thinking.
    
    I may try to torrent a corporate network if I can find a good file "explorer" or file access subsystem that integrates into Windows.
- Re:SneakerNet * (Score:2)
  
  by raddan ( 519638 ) writes:
  
  You're forgetting about:
  * backups
  * authentication/permissions
  * simultaneous use of the same file
  etc...
  These are problems that have already been addressed in most corporate LANs. Fault tolerance is an issue, yes, but if I had to trade the few items above for the extra tolerance that a P2P network gives me, I'd stay with the regular 'ol client-server model.
  I'm not saying that P2P isn't a potential solution for the future, but for this application, it's not ready yet. In my experience, the problem i
  - Re:SneakerNet * (Score:2)
    
    by dada21 ( 163177 ) * writes:
    
    Backups are integrated in the hive. I think a backup node in the hive could stream backups constantly.
    
    Authentications/permissions can be realized by using a registry-like Address/Key/Source structure. The address of a chunk in the hive designates what data it is, the key can be 0 for public or an encryption key known to client apps permitted to access the chunk. Source is the data (encrypted or otherwise).
    
    Since the client node is responsible for reassimilating chunks it hived out, the encryption is twofo
- Re:SneakerNet * (Score:2)
  
  by silas_moeckel ( 234313 ) writes:
  
  Been there done that, AFS http://www.faqs.org/faqs/afs-faq/ [faqs.org] works wonders. Pretty much it's a nice fault tolerant file sharing system that supports direconnected ops meaning you can work with everything in disk cache and checkout / checkin things as needed.
- Re:SneakerNet * (Score:2)
  
  by killjoe ( 766577 ) writes:
  
  You know what you are describing looks an aweful lot like a distributed version control system with bittorrent as a transport. Of course with bittorrent you need a server to act as a tracker though.
  
  Why couln't something like SVK work for this?
Simple (Score:2)

by RussGarrett ( 90459 ) writes:

Clustering costs more for the software. Fault-tolerance generally costs more for the hardware, especially if you cluster using commodity equipment. When the software is free, clustering is the obvious option.
I've struggle with this one myself... (Score:2)

by Supp0rtLinux ( 594509 ) writes:

But in the end, I opted for a "both" approach. If I'm going to do a cluster, I usually do it for applications, so I'll build it out in an N+1 style so I can easily add more resources to the cluster. If uptime is the concern and not horse-power, I'll simply make things as redundant as possible with drives, power supplies, RAID, etc.
No difference, just a matter of packaging. (Score:4, Informative)

by TheMohel ( 143568 ) writes: on Monday October 03, 2005 @02:21PM (#13706676) Homepage

Having built both true high-reliability fault-tolerant devices and clustered systems, I don't see any fundamental theoretical difference. In both cases, you have redundant hardware capacity in place, theoretically to allow you to tolerate the failure of a certain amount of your hardware (and, sometimes, your software) for a certain amount of time. Neither option guards you against failures outside of the cluster or FT system box. Neither one is a panacea. Both are sold as snake-oil insurance against "badness".

In a single fault-tolerant box, you generally have environmental monitoring, careful attention to error detection, and automatic failover. You also have customer-replaceable units for failure-prone components, utiilties for managing all of the redundancy, and a fancy nameplate. In exchange for that, you have more complexity, more cost, serious custom hardware and software modifications, and often (but not always) performance constraints.

In a clustered system, you treat each individual server as a failure unit. Good fault detection is a challenge, especially for damaging but non-catastrophic failure, but it's much easier to configure a given level of redundancy and it's easier to take care of environmental problems like building power (or water in the second floor) -- you just configure part of the cluster a longer distance away.

Where clustering is inadequate is when you have a single mission-critical system where any failure is disaster (like flight-control avionics or nuclear power plant monitoring). There are applications where there's no substitute for redundant design, locked-clock processors and "voting" hardware, and all of the other low-level safeguards you can use.

For Web applications, however, where a certain sloppiness is tolerable, and where the advantages of load balancing, off-the-shelf hardware and software, and system administration that doesn't require an EE with obsessive-compulsive disorder, clusters are the natural solution.

The fact that you get to sell more licenses for the software is just gravy.

Share
twitter facebook
Don't forget Load Balancing (Score:2)

by ScentCone ( 795499 ) writes:

Speaking of the Windows universe, here. I've found actual for-real clustering (say, of SQL Server) to be workable, but to be a serious (and expensive) pain in the ass. Obviously it depends on the app, but log-shipping and other mechanisms are frequently good enough to prepare for fail-over to another machine, and decent fault-tolerant hardware is good enough insurance for a lot of circumstances.

On the web side of things, clustering (actual clustering) sure hasn't come up much in my world. But I use nativ
Standard and up to date hardware and software (Score:2)

by markus_baertschi ( 259069 ) writes:

The main problem is that building a fault-tolerant server is an ardous task. It take a lot of engineering and testing. This slows you down and your product cycles get long. When you bring your new machine to the market it will look old and slow compared to 'standard' competitors. In addition your database will be a specialized, proprietary version which does not work with any tool and the admin staff needs special education to manage and operate it.
Clusters are different. Just take your latest and greatest
Fault tolerance only goes so far (Score:2, Insightful)

by networkphantom ( 919389 ) writes:

We run volumes of Dell 2850s with RAID arrays, redundant power, etc. powering high volume websites... I can speak first handedly that internal fault tolerance in these systems can only get you so far, where a failure of a component such as the management device in charge of the two power supplies, itself fails, resulting in both power supplies being useless. Or a raid card going out of commission, leaving drives with mangled and unrecoverable data. As with most solutions, a mixture of both fault toleranc
The big disadvantage (Score:2)

by OeLeWaPpErKe ( 412765 ) writes:

Clustering has a MAJOR problem going with it. Clustering requires applications to be written specifically to support clustering. All sorts of libraries have been written to "make this process easier", but one thing's for sure : it will require a recompile, and software that is not designed by people who know what ACID means for databases. It is very hard to keep a hand written app in a consistent state on all machines, knowing that any one of them might fail completely (we only support complete failures, di
- Re:The big disadvantage (Score:2)
  
  by TTK Ciar ( 698795 ) writes:
  
  Clustering has a MAJOR problem going with it. Clustering requires applications to be written specifically to support clustering. All sorts of libraries have been written to "make this process easier", but one thing's for sure : it will require a recompile
  
  This is not true at all for many of the most-common cluster applications. Framework software exists which "gangs together" a pool of servers, each of which can run ordinary, non-cluster-aware software. No need to write code, no need for a recompile.
F'ed up terminology.... (Score:2)

by numbski ( 515011 ) * writes:

Clustering: Several systems that do parallel computing.

Fault Tolerant Servers: Serval systems will a failover loadbalancers in front.

I get frustrated when people use the latter and call it the former. True, you could hae fault tolerant servers in a single box, but why? In fact I'm rolling out infrastructure of the latter in large dose.

This is how google dunnit. Very well in fact. ;) It doesn't have to be expensive either. So far the most expensive part seems to be a soft switch for SAN so I can use Open
- Re:F'ed up terminology.... (Score:2)
  
  by MadMorf ( 118601 ) writes:
  
  Clustering: Several systems that do parallel computing.
  Fault Tolerant Servers: Serval systems will a failover loadbalancers in front.
  I get frustrated when people use the latter and call it the former.
  
  Not necessarily:
  High Availability Clusters are what we're talking about here.
  
  You're talking about High Performance Clusters, which is NOT what we're talking about...
For firewalls and/or routers (Score:3, Informative)

by SquadBoy ( 167263 ) writes: on Monday October 03, 2005 @02:25PM (#13706710) Homepage Journal

There is nothing like OpenBSD running pf and carp. Dead easy to set up, works like a charm, and secure by default. One wonders why the editors seem to think OSS == Linux.

http://www.openbsd.org/faq/pf/index.html [openbsd.org]
http://www.openbsd.org/faq/faq6.html#CARP [openbsd.org]

Share
twitter facebook
This is not an XOR (Score:2)

by 3ryon ( 415000 ) writes:

I have always clustered fault-tolerant servers. For important business applications there is no choice but clustering. However, I want to fail over to the standby node on my own terms...not a hardware failure. This solution gives you great availability along with the chance to make firmware/driver/hardware updates to the fail-to node during business hours. You can then fail over in a maintence window and then update the other server during business hours.

BTW, SQL server does not require that you buy li
Availability vs. Reliability (Score:3, Insightful)

by JustASlashDotGuy ( 905444 ) writes: on Monday October 03, 2005 @02:28PM (#13706729)

It all comes down to Availability (Clustering) vs. Reliability (Fault Tolerant). They are NOT the same thing.

Fault tolerant servers are nice, even the simplest true server should offer some fault tolerance to a degree (IE: RAID drives). This is handy but may not help your availability in the event that you have a SLA promising xx% of uptime and then find yourself needing to take the server down to apply service packs or other patches.

Clustered servers allow you to increase the availability of your machines, because when you need to take one down for some updates, you can simply fail over all your traffic to the other server in the cluster accordingly. Clustering may increase the availability of the services those machines are offering, but it doesn't not help the reliability of the machines themselves.

Therefore, I personally choose to start with fault tolerant machines initially (RAID and dual power supplies at a minimum). It makes for a good base. If the services on that machine are 'mission critical', then cluster that machine with other fault tolerant machines.

Share
twitter facebook
Capacity Planning (Score:2)

by rufey ( 683902 ) writes:

Its harder to increase the capacity of a fault tolerant system - at some point you reach a limit as to how many CPUs and memory you can add, and to a lesser extent, the amount of disk (assuming you use a storage area network).
With a cluster, you simply add another machine to the cluster when you need more computing power. You can also take a single machine off the cluster for upgrades, hardware troubleshooting, or to reallocate the single machine to do something else.
As other posters have said, a l
All about the cost/benefit. (Score:2)

by jafo ( 11982 ) * writes:

High Availability is all about cost/benefit. RAID and a redundant power-supply are both reasonably cheap for smaller systems, and increase system management complexity only a bit. They are also fairly limited in what they can protect against: certain disc or power supply failures.

A cluster can, if properly designed, protect against all sorts of failures: disc, power supply, controller, motherboard, CPU, backplane, cable, network, some designs can even deal with physical disaster like a fire in one of your
Fault Tolerance (Score:2)

by rlp ( 11898 ) writes:

Read a lot of misinformation on this thread. Properly designed, a fault tolerant machine should NOT require downtime to replace a failed component, as all components (including CPU modules) should be hot-pluggable. In general, a fault tolerant system should be able to shut down a single failed component and keep going without any noticable impact on processing. A cluster may require take some time to switch-over depending if it is a fail-over system, or may need to restore / restart / migrate a checkpoin
Google as an example (Score:3, Interesting)

by Guspaz ( 556486 ) writes: on Monday October 03, 2005 @02:36PM (#13706801)

Google proved that clustering could be fault tolerant, while costing less than true fault-tolerant hardware.

Google built massive clusters of thousands of machines out of very cheap unreliable hardware. They have tons of hardware failure due to the extremely cheap components (and sheer number of machines), but everything is redundant (And fully fault tolerant).

They did this, again, using dirt cheap hardware.

Share
twitter facebook
- - - Re:Google as an example (Score:3, Informative)
      
      by Thundersnatch ( 671481 ) writes:
      
      Google does not have to worry about ACID compliance in their database. From what I've read about the google file system, cluster nodes lazily share new data amongst themselves. Serving up old data is explicitly allowed.
      
      To cluster something like an OLTP database, every node has to be immediately informed about updates to the data, and they all have to report back that they have said data intact before the transaction commits. This can be something of a problem when you have hundreds of thousands of updates
NetWare and Windows Clustering... (Score:2)

by MadMorf ( 118601 ) writes:

I've worked with both fairly extensively and I'd have to say that although NetWare clusters seem to be more stable than Windows clusters, neither is a great solution for anything...

In my experience, the Windows Cluster Nodes will fail into some sort of "undead" state, in which the dead node isn't quite dead yet and the live node never quite picks up the slack, so you end up having to reboot both of them...

The NetWare Cluster Nodes have such a hair-trigger with the default settings that they seem to fail-ove
Scalibility (Score:2)

by Launch ( 66938 ) writes:

When dicussing Fault Tollerant vs. Clustering systems it's extreemly important to dicuss the need for scalibility. Clustered systems are inherintly scalible, while fault tollerant are not (in general).

For my business needs I usually see clustered systems as a much greater solution than fault tollerance. When dealing with systems that require fault tollerance you mostly are concerned with keeping the data they store avalible (database servers, file systems, etc). When dealing with systems where high avali
- Re:Scalibility (Score:2)
  
  by Launch ( 66938 ) writes:
  
  I also wanted to mention cost.
  
  Usually a clustered server solution is very comprible in cost to a fault tollerant solution. In general your clustered boxes are pretty cheap off the shelf deals, while fault tollerant machines are not. When a critical error occurs with a fault tollerant system, the cost to repair is much greater than in a clustered solution, and downtime can be exponetially higher.
  
  Clustered solutions are designed to maintain uptime even when their is failure.. FT solutions are designed not
How much load? (Score:2)

by TheSHAD0W ( 258774 ) writes:

How much load is your site going to need to handle? If it's high, clustering is a darn good idea, because the separate machines will share the load on top of giving you redundancy. If the load expected is low, a single fault-tolerant machine will be easier to maintain.

This especially goes for multiple services, and you may want to mix-and-match. For a CGI+SQL combo, you may prefer to split the web load over a cluster, but you may want to forego the complexity of a clustered database and put your SQL serv
One word. (Score:3, Insightful)

by Wdomburg ( 141264 ) writes: on Monday October 03, 2005 @03:19PM (#13707145)

Both.

Share
twitter facebook
Where did you get the idea SMP is cheaper? (Score:2)

by Glasswire ( 302197 ) writes:

Big, (fault tolerant or not) single-image Symmetric Multi Processing are certainly NOT, on a $/cpu basis cheaper than a cluster of the same number of cpus. Vendors not only make lower margins on clustered systems, these absolute amount of the cost is lower. Clustering with small-way (typically 2-way) commmodity systems will always be cheaper than big SMP - whether you get real (or in the case of some clusters, effective virtual) fault tolerance or not. The sensible rule-of-thumb today is that if your app
Well, let's see (Score:3, Informative)

by Cyno ( 85911 ) writes: on Monday October 03, 2005 @04:20PM (#13707616) Journal

Sun:
http://store.sun.com/CMTemplate/CEServlet?process= SunStore&cmdViewProduct_CP&catid=83174 [sun.com]

For around $20,000 you could build a PC cluster that includes:
20+ x Intel P4 D820 at ~$500 ea.
20+ x AMD64 X2 3800+ at ~$750 ea.

You could almost get a cluster of 40 Intel PCs, each with a dual-core chip running at 2.8 Ghz. Or almost 30 AMD64 PCs, each with a dual-core chip running at 1.8 Ghz. If you shop smart you can get gigabit ethernet on the motherboard and have a fault-tollerant / redundant system with over 10 times the performance of the Sun system.

I don't know about you, but I would take the cluster of AMD X2s. The Intels might beat 'em on price/performance, but the X2s might be a lil bit nicer to work on.

Share
twitter facebook
Clustered FT hardware is the proper solution (Score:3, Insightful)

by swordgeek ( 112599 ) writes: on Monday October 03, 2005 @05:44PM (#13708148) Journal

Others have said it, I'll say it again: you don't use clustering in place of FT hardware, or vice versa. You use them together!

Take a server: Hot-swappable mirrored OS disks, N+1 power supplies, dual NICs (which support failover), dual cards initiating separate paths to your storage (through independent switches, if fibre-attached), ECC RAM with on-system logic to take out a failing DIMM. Oh yeah, and multiple CPUs, again with logic to remove one from active use if need be. (chipkill sort of stuff.)

Now take another identical server (or two) and cluster them. By cluster, I mean add the heartbeat interconnects and software layer to monitor all of the mandated hardware and application resources, and fail over as necessary, or take other appropriate actions. Gluing a pile of machines together in a semi-aware grid is NOT a cluster, and does not properly address the same problem!

Now once you've got this environment in place, add the most crucial aspect: Highly competent sysadmins, and a strict change control system. The former will cost you a fair sum of money in salary, and the latter will likely necessitate duplicating your entire cluster for dev/test purposes, before rolling out changes.

That's the beginning of an HA environment. Still up for it?

Share
twitter facebook
- Re:And what's so difficult about... (Score:2)
  
  by magarity ( 164372 ) writes:
  
  Budget?
- Re:And what's so difficult about... (Score:5, Funny)
  
  by MankyD ( 567984 ) writes: on Monday October 03, 2005 @02:10PM (#13706574) Homepage
  
  And what's so difficult about clustering a bunch of fault tolerant servers?
  
  Well that just plain redundant. Err...
  
  Parent Share
  twitter facebook
- Never build systems on a core of failure. (Score:3, Informative)
  
  by CyricZ ( 887944 ) writes:
  
  That's one of those ideas that sounds all good and well, but it hardly works in practice. In many cases, downtime is unacceptable. You need transactions processed continually, and you cannot have downtime caused by a dead server.
  
  It is not a good idea to build a system out of parts that you know will fail, and then proceed to design the system around such failure. A far better idea is to spend some money, and design a system that will work. Of course you do take into account hardware failure, and you build i
  - Re:Never build systems on a core of failure. (Score:5, Informative)
    
    by dkleinsc ( 563838 ) writes: on Monday October 03, 2005 @02:40PM (#13706823) Homepage
    
    Most successful strategies I've heard of involve building a system out of parts that you know can't fail, and then designing the system around the failure of the parts that you know can't fail.
    
    Parent Share
    twitter facebook
- Re:Queue... (Score:3, Informative)
  
  by TheRealMindChild ( 743925 ) writes:
  
  In my opinion, Beowulf is not the hammer everyone thinks it is. Ask the average slashdot reader even, and they relate Beowulf to something more like OpenSSI or Mosix... something you can easily add nodes to, and just use a special compiler to compile all of your multthreaded/multiproc apps and it will all work magically.
  
  If you are one of those people, stop. A Beowulf cluster is a performance cluster, but it is not a replacement for an SMP system. You more or less have the master node delegate actual comput
- Re:This brings to mind Google's strategy. (Score:5, Interesting)
  
  by mcewen98 ( 683829 ) writes: on Monday October 03, 2005 @02:47PM (#13706872)
  
  According to a presentation that I recently attended given by Jim Reese, the guy who scaled google from a couple hundred servers to over 300,000, this is still true. It was a very interesting presentation and included discussion about the problems with cramming 80 pc's into a standard server rack... including heat, cable management, machine replacement.. etc.
  
  Other interesting tid bits that I remember:
  
  -over 300,000 x86 machines make up the network, with clusters all over the place which make searces return in under .3 seconds.
  -commodity hardware (maxtor, western digital, whatever is available) is used.
  -over a thousand machines fail daily. Most are automatically reboot, and it sounded like admins only come into play when a machine needs to be replaced.
  -the longest uptime of a single machine has been 7 years
  -they use a heavily modified redhat distro.
  -real time stats of the entire network can be seen at any moment
  
  i'm sure there were more interesting facts but that's all I can regurgitate at the moment.
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

It depends on what you want to do. (Score:3, Interesting)

Re:It depends on what you want to do. (Score:2, Insightful)

Re:It depends on what you want to do. (Score:3, Informative)

Re:It depends on what you want to do. (Score:5, Interesting)

Re:It depends on what you want to do. (Score:2)

Re:It depends on what you want to do. (Score:5, Insightful)

Re:It depends on what you want to do. (Score:3, Informative)

Re:It depends on what you want to do. (Score:5, Insightful)

a Java layer that load balances??!!??!!?!? (Score:2, Insightful)

Microsoft Windows Server DOES support clustering (Score:2)

Re:Microsoft Windows Server DOES support clusterin (Score:2)

Re:It depends on what you want to do. (Score:5, Informative)

Re:It depends on what you want to do. (Score:2)

Ignoramus (Score:5, Informative)

Re:Ignoramus (Score:2)

Re:It depends on what you want to do. (Score:5, Informative)

Re:It depends on what you want to do. (Score:5, Insightful)

More clustering benefits (Score:2, Insightful)

Re:It depends on what you want to do. (Score:3, Insightful)

Re:It depends on what you want to do. (Score:2, Interesting)

Re:It depends on what you want to do. (Score:3, Informative)

I don't see why anybody would use their own server (Score:2, Funny)

Re:I don't see why anybody would use their own ser (Score:2)

Oh the irony (Score:2, Funny)

I know where this is going (Score:2, Funny)

Software vendors (Score:5, Insightful)

Fault tolerant hardware is not the solution (Score:2, Insightful)

Re:Fault tolerant hardware is not the solution (Score:3, Informative)

So the choice is between... (Score:3, Funny)

And the moral of the story is..... (Score:2)

Re:So the choice is between... (Score:5, Funny)

Re:So the choice is between... (Score:4, Informative)

Re:So the choice is between... (Score:2)

Not the same. (Score:5, Informative)

Absolutely right (Score:4, Informative)

Since information wants to be free (Score:5, Funny)

More about the cost of hardware? (Score:5, Interesting)

Real world example and cost (Score:3, Interesting)

Clustering (Score:3, Insightful)

Apples and Oranges (Score:4, Funny)

Why are clusters better? (Score:3, Interesting)

Catrastrophic loss (Score:3, Insightful)

You shouild use both (Score:3, Insightful)

Clustering is safer (Score:2, Interesting)

Re:Clustering is safer (Score:2)

Re:Clustering is safer (Score:2)

Re:Clustering is safer (Score:2)

It all depends (Score:2, Insightful)

Clustering is really... (Score:2)

Not to be too snide... (Score:2)

Not either/or (Score:5, Interesting)

Re:Not either/or (Score:2, Insightful)

Re:Not either/or (Score:3, Interesting)

Flexibility (Score:2)

The Good, The Bad, and The Ugly (Score:3, Insightful)

Clustering Potentially Solves More Problems (Score:4, Informative)

False dicotemy (Score:2)

SneakerNet * (Score:5, Interesting)

Re:SneakerNet * (Score:3, Insightful)

Re:SneakerNet * (Score:2)

Re:SneakerNet * (Score:2)

Re:SneakerNet * (Score:2)

Re:SneakerNet * (Score:2)

Re:SneakerNet * (Score:2)

Re:SneakerNet * (Score:2)

Re:SneakerNet * (Score:2)

Re:SneakerNet * (Score:2)

Re:SneakerNet * (Score:2)

Simple (Score:2)

I've struggle with this one myself... (Score:2)

No difference, just a matter of packaging. (Score:4, Informative)

Don't forget Load Balancing (Score:2)

Standard and up to date hardware and software (Score:2)

Fault tolerance only goes so far (Score:2, Insightful)

The big disadvantage (Score:2)

Re:The big disadvantage (Score:2)

F'ed up terminology.... (Score:2)

Re:F'ed up terminology.... (Score:2)

For firewalls and/or routers (Score:3, Informative)

This is not an XOR (Score:2)