Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Supercomputing

Best Solution For HA and Network Load Balancing? 298

supaneko writes "I am working with a non-profit that will eventually host a massive online self-help archive and community (using FTP and HTTP services). We are expecting 1,000+ unique visitors / day. I know that having only one server to serve this number of people is not a great idea, so I began to look into clusters. After a bit of reading I determined that I am looking for high availability, in case of hardware fault, and network load balancing, which will allow the load to be shared among the two to six servers that we hope to purchase. What I have not been able to determine is the 'perfect' solution that would offer efficiency, ease-of-use, simple maintenance, enjoyable performance, and a notably better experience when compared to other setups. Reading about Windows 2003 Clustering makes the whole process sounds easy, while Linux and FreeBSD just seem overly complicated. But is this truly the case? What have you all done for clustering solutions that worked out well? What key features should I be aware for successful cluster setup (hubs, wiring, hardware, software, same servers across the board, etc.)?"
This discussion has been archived. No new comments can be posted.

Best Solution For HA and Network Load Balancing?

Comments Filter:
  • by onion2k ( 203094 ) * on Monday March 02, 2009 @04:54AM (#27038167) Homepage

    1000+ unique visitors is nothing. Even if they all hit the site at lunchtime (1 hour window), and look at 30 pages each (very high estimate for a normal site) that's only 8 requests a second. That isn't a lot. A single server could cope easily, especially if it's mostly static content. As an example, a forum I run gets a sustained 1000+ users an hour and runs fine on one server.

    As for "high availability", that depends on your definition of "high". If the site being down for a morning is a big problem then you'll need a redundant failover server. If it being down for 15 minutes is a problem then you'll need a couple of them. You won't need a load balancer for that because the redundant servers will be sitting there doing nothing most of the time (hopefully). You'll need something that detects the primary server is offline and switches to the backup automatically. You might also want to have a separate database server that mirrors the primary DB if you're storing a lot of user content, plus a backup for it (though the backup DB server could always be the same physical machine as one of the backup webservers).

    Whoever told you that you'll need as many as 6 servers is just plain wrong. That would be a waste of money. Either that or you're just seeing this as an opportunity to buy lots of servers to play with, in which case buy whatever your budget will allow! :)

    • by MosesJones ( 55544 ) on Monday March 02, 2009 @05:20AM (#27038259) Homepage

      Lets get more blunt. Depending on what you are doing and if you want to worry about failover then 1000 a day is bugger all. Simple set up of Apache and Tomcat (if using Java) with running round-robin load-balancing will give you pretty much what you need.

      If however you really are worried about scale up and scale down then have a look at Amazon Web Services as that will probably more cost effective to cope with a peak IF it occurs rather than buying 6 servers to do bugger all most of the time.

      2 boxes for hardware failover will do you fine, if you are worried about HA the its the COST of downtime that you are worried about (i.e. down for an hour exceeds $1000 in lost revenue) which will justify the solution. Don't just drive availability to five nines because you feel its cool, do it because the business requires it.

      • by rufus t firefly ( 35399 ) on Monday March 02, 2009 @08:15AM (#27039043) Homepage
        There are a number of nice load balancers out there which are opensource. I'm partial to HAproxy, but you could try:

        HAproxy (which is the one I use) has the ability to define "backup" servers which can be used in the event of a complete failure of all servers in the pool, even if there is only one server in the main pool. If you're trying to do this on the cheap, that may help. It also has embedded builds for things like the NSLU2, so it may be easy to run on an embedded device you already have.

      • by mcrbids ( 148650 ) on Monday March 02, 2009 @04:13PM (#27044569) Journal

        2 boxes for hardware failover will do you fine, if you are worried about HA the its the COST of downtime that you are worried about (i.e. down for an hour exceeds $1000 in lost revenue) which will justify the solution. Don't just drive availability to five nines because you feel its cool, do it because the business requires it.

        This is something that is rampant: techies tend to overestimate the value of uptime.

        Sure, it's sexy to have high availability this and redundant that, but unless your company is pulling down at least $1,000,000 per year or more in gross revenues, it's hard to beat the 3 to 4 nines or so uptime delivered by a good quality, whitebox server running Linux. Something like this unit [aberdeeninc.com] would deliver excellent performance and excellent reliability at a very low cost.

        How much does an hour of downtime actually cost your company? Be honest. If you had to tell your customers: "we were down for 2 hours because a software update caused us to have to ..." what would it actually cost your company? Especially if it only happened every year or so? In my experience, even in fairly stiff production environments, there has been no cost at all. We've maintained about 99.95% uptime for the past 3 years, with 1 "incident" every year or so, with no cost at all. In fact, our company has a good reputation for availability and support!

        So don't spend money on sexy hardware with lots of blinkie lights and cross-connects, which often decrease your reliability by introducing unnecessary complexity.

        Instead, spend money on your hosting. Don't *ever* host it in-house. Ever. Get a first-tier hosting facility, with redundant network feeds, power, and staff who give a damn. Don't be afraid to pay for it, because it will probably save you money, anyway. You'd be amazed at how price-competitive top-notch hosting farms can be!

        Make sure to get to know the on-site techies on a first-name basis, give 'em a six-pack of their favorite beverage, and thank them profusely when they do anything for you. The goodwill these types of things can bring will work wonders for you down the road.

        And remember:

        2 nines is 3.65 days of downtime per year.
        3 nines is .365 days of downtime per year (~ 8 hours)
        4 nines is .0365 days of downtime per year (~ 45 minutes)

        It's a very, very rare case indeed where 3-4 nines of uptime isn't completely sufficient.

        And 1,000 unique visits per day? Pssht. Unless you are doing some pretty ferocious database stuff, (EG: joins across 12 tables with combined inner/outer/composite joins) the aforementioned server should do the job just wonderfully.

        DON'T FORGET BACKUPS! And backup your backups, because backups fail, too.

    • by drsmithy ( 35869 ) <drsmithy.gmail@com> on Monday March 02, 2009 @05:28AM (#27038279)

      You'll need something that detects the primary server is offline and switches to the backup automatically. You might also want to have a separate database server that mirrors the primary DB if you're storing a lot of user content, plus a backup for it (though the backup DB server could always be the same physical machine as one of the backup webservers).

      On this note, if you're comfortable (and your application is compatible) with Linux+Apache, then heartbeat [linux-ha.org] and DRBD [drbd.org] will do this and are relatively simple to get up and running. Just avoid trying to use the heartbeat v2-style config (for simplicity), make sure both the database and apache are controlled by heartbeat, and don't forget to put your DB on the DRBD-replicated disk (vastly simpler than trying to deal with DB-level replication, and more than adequate for such a low load).

      Oh, and don't forget to keep regular backups of your DB somewhere else other than those two machines.

      • by wisty ( 1335733 )

        Backup your db. Test your db backup. Get someone else to check your backup strategy. That's mission critical, and it merits repeating.

        1000 users a day? Windows can start about 10 Python processes a second (and handle a bit of processing within that process), which is probably the slowest way you could possibly do it. OSX or Linux can do 10 times as much.

      • Re: (Score:2, Informative)

        by verrol ( 43973 )

        I can attest to this. This is the same setup we used which had VoIP, DB, and HTTP. We ran OpenVZ on CentOS on DRBD. Each openvz virtual machine ran a service, sometimes several of the same services (db and voip). because fo DRBD, redundancy was taking care of, an using heartbeat, well, high availability was also easy. It worked very well. the only thing i would say, it is takes some knowledge and much elbow grease to get this working and plenty of testing. where as, some of the other solutions would b

    • by Mad Merlin ( 837387 ) on Monday March 02, 2009 @05:28AM (#27038281) Homepage

      I agree that 1000 unique visitors is peanuts, but as for how to do HA, it really depends a lot on your situation. For example, the primary server for Game! [wittyrpg.com] started acting up about 2 weeks ago, but it mattered little as I was able to flip over to the backup server and came out with barely any downtime and no data loss. In the mean time, I was able to diagnose and fix the primary server, then point the traffic back at it. In my case, all the dynamic data is in MySQL, which is replicated to the backup server, so when I switched over I simply swapped the slave and the master and redirected traffic at the backup server. You also have to consider the code, which you presumably make semi-frequent updates to. In my case, the code is stored in SVN and updated automagically on both the master and the slave simultaneously.

      Having said all that, there's more to consider than just your own hardware when it comes to HA. What happens if your network connection goes down? In most cases, there's nothing you can do about it except twiddle your thumbs while you wait on hold with customer service. Redundant Internet connections are expensive due to the fact that you basically need to be in a big (and expensive) colocation facility to get it.

      Also, how easy it is to have HA depends largely on how important writes are to your database (or filesystem). Does it matter if this comment doesn't make it to the live page for a couple seconds after I hit submit? No, not really. Does it matter if I change my equipment in Game! [wittyrpg.com] but don't see the changes immediately? Yes, definitely. Indeed, if your content is 100% static, you can just keep a dozen complete copies and put a load balancer in front that pulls dead machines out of the loop automagically and be done with it.

      • Re: (Score:3, Informative)

        by Anonymous Coward

        To truly be HA you would need global load balancing. Your global load balancers are in essence the master name servers. You have 2 or more physical locations and the loadbalancer serves DNS lokups with very very low TTL to a site that is up (and if its more advanced to a site that is closest to the requester). If a DC blows up, your site stays up.

        For each site dual ethernet drops, dual firewalls, and dual loadbalancers with a separate connection from each LB to server. Each piece of network gear has connect

      • by multipartmixed ( 163409 ) on Monday March 02, 2009 @12:02PM (#27041483) Homepage

        Sounds like you do HA pretty much how I do it, only with different pieces (I live in Sun + Oracle land).

        One thing I try to get HA admins to stop thinking about is the "primary" and "backup" servers. I find deploying a sysadmin.mental state which considers both (or all) servers equal helps make stuff more reliable in the long run. So if server "bob" bursts into flames, server "joe" takes over. Bob is then serviced and left alone until Joe bursts into flames (or we schedule some downtime and run re-certification tests).

        The biggest key, as you may have guessed, to getting admins to treat servers equally instead of primary/backup is to name them appropriately. Bob & Joe is better in this case than haMaster and haSlave, or serverA and serverB, etc.

        Same with IPMP and stuff - don't define a preference, whatever is, is.

        Unless you have asymmetric hardware, which I really, really, really discourage.

        • Re: (Score:3, Informative)

          by afidel ( 530433 )
          In Oracle land a hot standby server has to be fully licensed, a warm standby server does not. If your needs aren't for five 9's then it makes a LOT of sense to use a warm standby DR box.
    • Re: (Score:2, Informative)

      by Anonymous Coward

      Definitely. I had a site that was doing ~2000+ unique per day, used considerable bandwidth (lots of images). However, everything was heavily cached (no on-demand dynamic pages). And it was running on all on an old P4 and 512MB of RAM with fantastic response times and zero issues.

    • by Zocalo ( 252965 )
      The poster doesn't make any indication of how much traffic each of those "1,000+ visitors a day" will generate, either in terms of the number of requests or the number of bytes. Nor is any indication given as to the nature of the service, required resiliance or the method of information exchange provided. For a simple HTML form, back-end DB based system without high uptime requirements, then the required infrastructure is trivial, but if we're going to the opposite extreme and talking about five nines upt
      • Re: (Score:3, Insightful)

        by Bandman ( 86149 )

        I think it's sort of fortunate that the submitter was vague. This way, I get to read about all sorts of HA solutions, where as if he really wanted 2 apache servers and a hot/cold mysql instance, I'd have been way more bored ;-)

    • by Xest ( 935314 ) on Monday March 02, 2009 @06:53AM (#27038649)

      I was thinking along the same lines.

      But to the person asking the question, if you want a full answer then you need to get your site built and make use of stress testing tools such as JMeter for Apache or Microsoft's WAS tool for IIS.

      It's not something anyone here can give you a definite answer for without knowing how well your site is implemented and what it actually does.

      Look into Transaction Cost Analysis, that's ultimately what you need here, a good start is this article:

      http://technet.microsoft.com/en-us/commerceserver/bb608757.aspx [microsoft.com]

      or this one:

      http://msdn.microsoft.com/en-us/library/cc261632.aspx [microsoft.com]

      Don't worry that these are MS articles on MS technologies they both still cover the ideas that are applicable elsewhere.

      Even though no one here can give you a full answer for the above mentioned reasons, we can at least give you our best guesses and this is where I think the parent poster is spot on, 6 servers is absolute overkill for this kind of load requirements and indeed, unless your application does some pretty intensive processing I see little reason why a single server couldn't do the trick or at least a web/application server and a database server at most.

      For ensuring high availability you may indeed need more servers of course and as you mention a requirement for FTP is bandwidth likely to be an issue?

      The fact you're only expecting 1000 a day suggest you're not running the biggest of operations and although it's nice to do these things in house it may just be worth you using a hosting provider with an acceptable SLA, at the end of the day they have more experience, more hardware, more bandwidth and can probably even do things a fair bit cheaper than you can. Do you have a generator to allow continued provision of the service should your power fail for an extended period for example? If you receive an unexpected spike in traffic or a DDOS do you have the facility to cope with and resolve that like a big hosting company could?

      There are many things I wouldn't ever use an external hosting provider for, but this doesn't sound like one of them.

    • YOU have a good rule-of-thumb analysis there. I like it, and it should apply to most normal sites.

    • Unless your application is very resource intensive (or badly written) a single server can cope easily with 1000 visitors. So add another server or two for redundancy.

      Use RAID1 (RAID10 if you need better disk performance), and get backups. If you're on a tight budget you could use hotplug SATA drives for backups (if you don't have a habit of dropping your backup media on the floor, HDDs can be better than tapes). If you're on a really tight budget use those USB to PATA/SATA adapters ;).

      I suspect you will fin
  • by s7uar7 ( 746699 ) on Monday March 02, 2009 @04:56AM (#27038181) Homepage
    If the site goes down do you lose truck loads of money or does anyone die? Load balancing and HA sounds a little overboard for a site with a thousand visitors a day. A hundred thousand and you can probably justify the expense. I would probably just be looking at a hosted dedicated server somewhere for now.
    • Re: (Score:3, Insightful)

      by cerberusss ( 660701 )

      Well a dedicated server requires maintenance. All my customers come to me saying that they will eventually get 100,000 visitors per day. I make the calculation for them for the monthly cost: $100 for a decent dedicated server, plus $250 for a sysadmin etc.

      Eventually they all settle for shared hosting except when privacy is an issue.

    • by Errtu76 ( 776778 ) on Monday March 02, 2009 @05:50AM (#27038369) Journal

      It's not overboard. And even with a hosting provider you're still dependent on hardware problems. What you can do to realise what you want is:

      - buy 2 cheap servers with lots of RAM
      - set them up as XEN platforms
      - create 2 virtuals for the loadbalancers
      - setup LVS (heartbeat + ldirectord) on each virtual
      - create 4 webserver virtuals, 2 on each xen host
      - configure your loadbalancers to distribute load over all webserver virtuals

      And you're done. Oh, make sure to disable tcp_checksum_offloading on your webservers, else LVS won't work that well (read: not at all).

      • by drsmithy ( 35869 ) <drsmithy.gmail@com> on Monday March 02, 2009 @06:52AM (#27038641)

        And you're done. Oh, make sure to disable tcp_checksum_offloading on your webservers, else LVS won't work that well (read: not at all).

        Just a heads-up for those who (like me) read this and thought: "WTF ? LVS works fine with TOE", it is a problem specific to running LVS in Xen VMs where the directors and realservers share the same Xen host. Link. [austintek.com]

      • by alta ( 1263 ) on Monday March 02, 2009 @07:26AM (#27038829) Homepage Journal

        If I had mod points, I'd give. This is the same thing we did, just different software.
        -get 2 ISP, I suggest different transports. We have one as fiber, the other is a T1. There's no point in getting 2 T1 from different companies if a bulldozer cuts them together.
        -Two dell 1950's
        -Set each up with vmware server
        -created 2 databases, replicating to each other
        -Created 2 web servers, each pointing at database on same machine
        -installed to copies of Hercules load balancer, vrrp + pen
        -set up failover DNS with 5 minute expiration.

        Now, you may say, why the load balancers if you're load balancing with DNS? Because if I have a hardware/power failure that's one instance where the 5 minutes for DNS to expire will not incure downtime for my customers. It also gives me the ability to take servers offline one at a time for maintenance/upgrades, again with no dowtime.

        I have a pretty redundant setup here and the only thing I've paid for is the software.

        Future plans are to move everything to Xenserver.

      • What's the use of running 2 virtual webservers on one piece of hardware?

        • Re: (Score:3, Informative)

          VMs are like a bullet-proof vest for your hardware.

          If a virtual machine takes it in the ass and crashes, the system can spawn a new one without missing a beat, whereas the same crash on the actual machine might cause it to crash.

          It's also a good strategy to provide for future growth...If your machines are already virtual, you can host them on any hardware that's appropriate, and you can run as many as you need.

  • budget? (Score:5, Insightful)

    by timmarhy ( 659436 ) on Monday March 02, 2009 @05:00AM (#27038189)
    you can go as crazy as you like with this kind of stuff, but given your a non profit i'm guessing money is the greatest factor here. my reccomendation would be to purchase managed hosting and NOT try running it yourself. folks with a well established data centre that do this stuff all day long will do it much better,quicker,cheaper than you will be able to.

    there is also more of them than you can poke a stick at and prices are very reasonable. places like rackspace for this kind of thing for $100/mo.

    the other advantage is you don't need to pony up for the hardware.

    • Re: (Score:3, Insightful)

      by malkavian ( 9512 )

      The problem being that you're paying $100 per month in perpetuity. Sometimes you get awarded capital to spend on things in a lump sum, whereas the ability to garner a revenue commitment could not necessarily be made.
      At the spend rates you mentioned, that's a basic server per year. Say the server is expected to last 5-8 years, that'll be an outlay of at least $6000-$9600+, with more to spend if you want to keep things running.
      That would cover the cost of a couple of generations worth of hardware, depending

      • Re: (Score:3, Insightful)

        by drsmithy ( 35869 )

        The problem being that you're paying $100 per month in perpetuity. Sometimes you get awarded capital to spend on things in a lump sum, whereas the ability to garner a revenue commitment could not necessarily be made.
        At the spend rates you mentioned, that's a basic server per year. Say the server is expected to last 5-8 years, that'll be an outlay of at least $6000-$9600+, with more to spend if you want to keep things running.
        That would cover the cost of a couple of generations worth of hardware, dependin

      • by TCM ( 130219 )

        Say the server is expected to last 5-8 years, that'll be an outlay of at least $6000-$9600+, with more to spend if you want to keep things running.

        If $10K over 8 YEARS is a problem, the project can't be important enough to justify HA.

    • I work for a 20,000+ private user network that has some pretty critical demands for High Availability. If you are working for a non-profit, chances are that you simply will not be able to afford "true high availability", which requires a plethora of support features that are prohibitively expensive (for example, Cisco Content Switches cost $10,000+).

      Until you hit over 1,000+ Unique visitors "per minute" your best bet may be to have your site split amongst several different hosting providers (serverbeach, ra

  • Pound (Score:4, Informative)

    by pdbaby ( 609052 ) on Monday March 02, 2009 @05:02AM (#27038203)

    At work we have a pretty good experience with Pound - it's easy to set up & it load balances and will detect when one of your servers is down and stop sending traffic there. You can get hardware load balancing from people like F5 too.

    If you're just starting out you'll probably want to start with software and then, if the load demands it, move to hardware

    Machine-wise, we use cheap & not overly powerful 250 GBP, 1u servers with a RAID1; they'll die after a few years (but servers will need to be refreshed anyway) and they provide us with lots of options. They're all plugged into 2 gigabit switches

    • I am a graduate student who wants a little extra computing power for scientific analysis work.

      I have a small budget. 800 bucks.

      I have heard of this guy building a microwulf cluster, http://www.calvin.edu/~adams/research/microwulf [calvin.edu] that generated some good flops, at least at that time. Today I can build that very same cluster for about 800 dollars.

      My question: Is it better to go with a newer computer setup that falls within that budget, or go with the cluster. I will be doing image analysis work of funct

      • by drsmithy ( 35869 )

        My question: Is it better to go with a newer computer setup that falls within that budget, or go with the cluster. I will be doing image analysis work of function MRI data. Thanks.

        While I'm not an expert on the topic by any means, I would expect for that sort of budget you'll get far better performance out of a single a machine, than any cluster you could build for the same cost.

        Even if your interest is in testing how "cluster friendly" your code is (eg: for scaling considerations), you'll almost certain

    • HaProxy (Score:5, Informative)

      by Nicolas MONNET ( 4727 ) <nicoaltiva@gmail. c o m> on Monday March 02, 2009 @05:38AM (#27038315) Journal

      Haproxy [1wt.eu] is better than Pound, IMO. It's lightweight, but handles immense load just as well as layer 3 load balancing (LVS), with the advantages of layer 5 proxying. It uses the latest Linux APIs (epoll, vmsplice) to reduce context switching and copying to a minimum. It has a nice, concise stats module. Its logs are terse yet complete. It redirects traffic to a working server if one is down / overloaded.

      • Re: (Score:3, Informative)

        I seem to recall slashdot operating behind pound systems. It was a good enough plug for me to go and fire it up, been happy with it ever since. Not to say haproxy is better or worse, I've never used it, just another person with great results from pound.

        We get upwards of 15,000 hits per hour and just use Carp and Pound to handle our redundancy (Carp captures servers down, pound handles TCP ports going missing) across two machines (both RAID5 with FA RAM). Last time I checked the load averages, the 2.2 G pro
    • Where do you get your 250GBP servers from? And do they have hot-swap drive bays? =)

  • by Manip ( 656104 ) on Monday March 02, 2009 @05:06AM (#27038215)

    Why are you purchasing six or so servers before you even have one online?

    You say that you expect "1,000+ a day" visitors which frankly is nothing. A single home PC with Apache would handle that.

    This entire posts strikes me as either bad planning or no planning. You're flirting with vague "out of thin air" projections that are likely impossible to make at this stage.

    Have a plan in place for how you will scale your service *if* it becomes popular or as it becomes popular but don't go wasting the charities money just in case your load jumps from 0 to 30,000+ in 24 hours.

    • by fl!ptop ( 902193 )

      don't go wasting the charities money

      not to nitpick, but not all non-profits are charities, and some non-profits have a lot of money to spend. case in point [mrs.org]

  • KISS (Score:3, Insightful)

    by MichaelSmith ( 789609 ) on Monday March 02, 2009 @05:23AM (#27038265) Homepage Journal
    Sit down for a bit and think about the most likely use cases for your software. To give the example of slashdot that might be viewing the main page or viewing an entire article. Structure your code so that these things can be done be directly sending a single file to the client. With the kernel doing most of the work you should be okay.

    Sites which get slashdotted typically use a badly structured and resourced database to directly feed external queries. If you must use a database put some kind of simple proxy between it and the outside world. You could use squid for that or a simple directory of static html files.
  • by modir ( 66559 ) on Monday March 02, 2009 @05:25AM (#27038269) Homepage Journal

    I want to give you some more information. Based on your visitor estimates I think you do not have a lot of knowledge about it. Because for this number of visitors you do not really need a cluster.

    But now to the other stuff. Yes, Windows clustering is (up to Win Server 2003 [1]) a lot easier. But this is because it is not really a cluster. The only thing you can do is having the software running on one server, then you stop it and start it on the new server. This is what Windows Cluster is doing for you. But you can not have the software running on both servers at the same time.

    If you really want to have a cluster then you need probably some sort of shared storage (FibreChannel, iSCSI, etc.). Or you are going to use something like DRDB [2]. You will need something like this too if you want to have a real cluster on Windows.

    I recommend you to read some more on the Linux HA website [3]. Then you get a better idea what components (shared storage, load balancer, etc.) you will need within your cluster.

    If you only want high availability and not load balancing then I recommend you to not use Windows Cluster. Better set-up two VMware servers with one virtual machine and then copy a snapshot of your virtual machine every few hours over to the second machine.

    [1] I don't know about Win Server 2008
    [2] http://www.drbd.org/ [drbd.org]
    [3] http://www.linux-ha.org/ [linux-ha.org]

    • Re: (Score:2, Informative)

      by blake1 ( 1148613 )

      The only thing you can do is having the software running on one server, then you stop it and start it on the new server. This is what Windows Cluster is doing for you.

      That's not true. For clustering of front-end services (ie, IIS) you use NLB which is fully configurable load balancing and fault tolerance.

      • Re: (Score:3, Informative)

        by modir ( 66559 )

        True, sorry I did not write it that clear. I was only writing about the Cluster software included with Windows. Not about other applications like NLB included with Windows too.

        I just wanted to make clear that Microsoft Cluster Server is a lot easier to set-up (what the questioner has seen correctly) but this is because you get a lot less. He would have to install and configure several other applications (like NLB) to get the same as he gets with Linux HA.

    • by turbine216 ( 458014 ) <turbine216@gmaCHICAGOil.com minus city> on Monday March 02, 2009 @08:22AM (#27039081)

      Windows clustering allows for Active/Active clusters, so you CAN run the same service on two cluster nodes at the same time (with the exception of Exchange).

      Setting up two servers to host VMWare guests and copying is not a good idea either - the HA tools for VMWare are expensive, and totally unneccessary for the proposed deployment. Without these HA tools, he would have to down his primary guest every time he wanted to make a snapshot.

      We're talking about a very simple deployment here - HTTP and FTP. You don't even need clustering or a dedicated load balancer - instead, try using round-robin DNS records to do some simple load balancing, and then use a shared storage area as your FTP root (could be a DFS share for Windows or an NFS mount in Linux). This would give you a solid two-server solution that works well for what you're trying to accomplish, and adding servers would be trivial (just deploy more nodes, and add DNS records to the list).

      If it grows much larger than 2 nodes, you might consider an inexpensive load-balancer; Barracuda sells one that works well and will detect a downed node.

      Clustering for this job is totally unnecessary though. You're wasting your time by looking into it.

    • by Bandman ( 86149 )

      I'm curious about DRDB. I've heard of it before, but not much, and never talked to someone using it.

      What happens in the event of a network disconnect, where the servers get out of sync?

  • Nginx (Score:2, Informative)

    by Tuqui ( 96668 )

    For LoadBalancing and statics file HTTP serving use Nginx, is the fastest around. Use two or more linux servers for your High Availability Cluster, set a virtual IP for the LoadBalancer and HeartBeat to switch the virtual IP in case of failure. Software cost including OS = zero.

  • Amazon EC2 (Score:2, Informative)

    by adamchou ( 993073 )

    Amazon's servers allow you to scale vertically and horizontally. They have images that are preconfigured to do load balancing and they have LAMP setups. Plus the fact that its a completely virtualized system means you never have to worry about hardware failures. with only 1k uniques per day, they have more than enough to accommodate for what you need

    as for ease of use, i've never done windows load balancing, but the linux load balancing isn't terribly difficult to get working. to optimize it is quite a bit

    • Amazon's servers allow you to scale vertically and horizontally. They have images that are preconfigured to do load balancing and they have LAMP setups.

      Amazon have too much hardware. If a bunch of suckers don't rent it from them at 1000x it's value, they will sell some of it.

      You can set up a cluster of 8 refurbished home theater PCs for 5 grand, and there's enough redundancy in that budget that you can drive a hammer through every third machine and your application fail.

      Why the hell would you want t
  • by midom ( 535130 ) on Monday March 02, 2009 @05:55AM (#27038391) Homepage
    Hi! we run a non-profit website that gets 100 million visitors a day on ~350 servers. we don't even use any "clustering" technology, just replication for databases, and software (LVS) load balancer in front of both app (PHP) and squids at the edge. but oh well, you can always waste money on expensive hardware and clustering technology. and you can always check how we build things [dammit.lt]
    • by ledow ( 319597 )

      Heh, so assuming things scale linearly (which I would find surprising), you could run at least 1 million visitors per day on 3.5 servers. And this guy wants six servers for 1000/day (or a little over). And I don't think that his needs would run anywhere near as complex as the example posted. :-)

  • First, figure out what it means for your website to be available (do people need to be able to fetch a page, or do that also be able to log in, etc.). Select monitoring software and set it up correctly.

    As for the serving architecture, at this level of load, you're better off without clustering. You don't need it for the load and it's probably a net loss for reliability; most outages I've seen in two-node cluster is either infrastructural that takes them both out (power distribution failures, for example

    • Actually 2-node active-passive can be a very good idea.

      Let's say you have two nodes behind a load balancer (only way to replicate functionality active-active... you could do the thing where one server is static though, like youtube does). You need a shared filesystem, so you need another node to act as a NAS. What if your app is database-backed? You can stick that on the NAS, probably. But then it's not redundant.

      It's really just simpler to have unidirectional replication, then script it to switch direc

  • and was handling like hundred thousands to a everyday, with off the shelf hardware spec 10 years ago. (Like 512M RAM and 1st era Pentium 4)

    There was no problem at all.

    We also used www.linuxvirtualserver.org to handle load balancing the web requests, and using yet another bigger Linux NFS for backend storage.

    The biggest problem for the HA is
    1. How you sync the data over, or do you rely on another central storage which then there is single point of failure again.
    2. If it involves Database, then it's is a much

  • by Enleth ( 947766 ) <enleth@enleth.com> on Monday March 02, 2009 @06:03AM (#27038423) Homepage

    I'm sorry, but I have to say that. Don't be offended, please - sooner or later you will look at your submission and laugh really hard, but for now you need to realise that you said something very, very silly. A few people already politely pointed out that 1000 visitors a day is nothing - but seriously, it's such a great magnitude of nothingness that, if you make such a gross misintepretation of your expected traffic, you need to reconsider if you really are the right person for the job *right now* and maybe gain some more experience before trying to spend other people's money on a ton of hardware that will just sit there, idle and consume huge amounts electricity (also paid by other people's money).

    I'm serving a 6k/day website (scripting, database, some custom daemons etc.) from a Celeron 1.5GHz with 1GB RAM, and it's still doing almost nothing. If you really have to have some load balancing, get two of those for $100 each.

    • I'm serving 1 million PHP hits a day and up to 200 MySQL queries per second (5 min avg) on a 3 GHz Celeron with 1 GB of RAM. I could do 10,000 hits per day on a 486.

    • by Bandman ( 86149 )

      It's been said before, but HA isn't just about load. Sure, he mentioned load balancing, but the HA part may be the more important.

  • Pointless (Score:5, Informative)

    by ledow ( 319597 ) on Monday March 02, 2009 @06:06AM (#27038429) Homepage

    1000 users a day? So what? That's less than one user a minute. Even if you assume they stay on the website for 20 or so minutes each, you're never looking at more than about 20 users at a time browsing content (there will be peaks and troughs, obviously). Now picture a computer that can only send out, say, 20 x 20 pages a minute (assuming you're visitors can visit a full page every 3 seconds) - we're talking "out of the Ark". Unless they are downloading about half a gig of video each, this is hardly a problem for a modern machine.

    I do the technical side for a large website which sees nearly ten times that (as far as you can trust web stats) and it runs off an ordinary shared host in an ordinary mom-n-pop webhosting facility and doesn't cost anywhere near the Earth to run. We often ask for more disk space, we've never had to ask for more bandwidth, or more CPU, or got told off for killing their systems. Admittedly, we don't do a lot of dynamic or flashy content but this is an ordinary shared server which we pay for out of our own pockets (and it costs less than our ISP subscriptions for the year, and the Google ad's make more than enough to cover that even at 0.3% clickthrough). We don't have any other servers helping us keep that site online (we have cold-spares at other hosting facilities should something go wrong, but that's because we're highly pedantic, not because we need them or that our users would miss us) - one shared server does the PHP, MySQL, serves dozens of Gigabytes per month of content for the entire site, generates the statistics etc. and doesn't even take a hit. I could probably serve that website off my old Linux router over ADSL and I doubt many people would notice except at peak times because of the bandwidth.

    Define "massive" too... this site I'm talking about does multiple dozens of Gigabytes of data transfer every month, and contains about 10Gb of data on the disk (our backup is now *three* DVD-R's... :-) ). That's *tiny* in terms of a lot of websites, but equally puts 99% of the websites out there to shame.

    Clustering is for when you have more than two or three servers already and primitive load-balancing (i.e. databases on one machine, video/images on another, or even just encoding half the URL's with "server2.domain.com" etc.) can't cope. In your case, I'd just have a hot-spare at a host somewhere, if I thought I needed it, with the data rsync'd every half-hour or so. For such a tiny thing, I probably wouldn't worry about the "switchover" between systems (because it would be rare and the users probably don't give a damn) and would just use DNS updates if it came to it. If I was being *really* pedantic, I might colo a server or two in a rack somewhere with the capability for one to steal the other's IP address if necessary, or have DNS with two A records, but I'd have to have a damn good reason for spending that amount of money regularly. If I was hosting in-house and the bandwidth was "free", I'd do the same.

    Seriously - this isn't cluster territory, unless you see those servers struggling heavily on their load. And if I saw that, I'd be more inclined to think the computers were just crap, the website was unnecessarily dynamic, or I had dozens-of-Gigabytes databases and tens or hundreds of thousands of daily visitors.

    You're in "basic hosting" territory. I doubt you'd hit 1Gb/month traffic unless the data you're serving is large.

  • Buy two good quality machines and keep one as a hot spare and just backup every night.

    The current "uptime" of a couple of my systems are 255 days, and that's only because of a power failure and subsequent end of generator fuel at my colo which no amount of on-site redundancy would have helped.

    Good quality machines and software *will* run for a year pr more with no issues.

    I've been setting up sites at data centers for about 10 years now, seriously, do the cost/benefit analysis, the base price is a couple mac

  • We will load test... (Score:2, Informative)

    by nicc777 ( 614519 )

    I see there are already a ton of good advice here, so when you have your kit set-up, post a link so that we can load test your config :-)

    It's called the slashdot effect and if anything, you will at least know when things break and how your configuration handle these fail over conditions.

    PS: This is cheaper then buying load testing kit and software :-)

  • I remember initially setting up our little site with 3 servers and a "managed" loadbalancer/failover solution from our hosting provider. Our domain name pointed to the IP address of the loadbalancer.

    I learned that "managed" is actually a hosting company euphemism for "shared" and performance was seriously degraded during "prime time" everyday.

    We eventually overcame our network latency issues by ditching the provider's loadbalancer and using round-robin DNS to point our domain name at all three of the
  • Has any /.er implemented the following ultra-simple solution to provide HA for websites serving static content: having the website DNS name resolve to 2 IP addresses pointing to 2 different servers, and simply duplicating the static content on the 2 servers ? How do browsers behave when 1 of the server goes down ? Will they automatically try to re-resolve the DNS name and attempt to contact the 2nd IP ? Or is the well-known DNS pinning security feature preventing them from falling back on the 2nd IP ?
    • How do browsers behave when 1 of the server goes down?

      Half the DNS lookups will still point at the failed server. Since most browsers cache the dns lookup, they will not re-request the IP address, and will just assume the site is down.

      If you rely on DNS round-robin records, you need to either ensure that they are always up (i.e. each one is an HA cluster) or that you can remove them quickly enough to cause your users as little pain as possible. (TTL should probably be 1 minute, and you'll want an automated

  • by sphealey ( 2855 ) on Monday March 02, 2009 @07:02AM (#27038693)

    First, I suggest you read and think deeply about Moens Nogood's essay "So Few Really Need Uptime" [blogspot.com].

    Key quote:

    ===Typically, it takes 8-9 months to truly test and stabilise a RAC system. As I've said somewhere else, some people elect to spend all of those nine months before going production whereas others split it so that some of the time is spent before and, indeed, some of it after going production.

    But that's not all: Even when the system has been stabilised and runs fine, it will a couple of times a year or more often go down and create problems that you never saw before.

    It's then time to call in external experts, but instead of just fixing the current cause of your IT crisis, I'd like to suggest that you instead consider the situation as one where you need to spend a good deal of resources in stabilising your system again - until the next IT crisis shows up.

    Your system will never be truly stable when it's complex. The amount of effort and money you'll need to spend on humans being able to react to problems, running the system day-to-day, and - very important - keep them on their toes by having realistic (terribly expensive) test systems, courses, drills on realistic gear, networks of people who can help right now, and so forth... is huge.

    The ironic thing is this: If you decide that you can live with downtime, and therefor with a much less complex system - your uptime will increase. Of course. ===

    And that corresponds pretty well to my experience: the more effort people make to duplicate hardware and build redundant failover environments the more failures and downtime they experience. Consider as well the concept of ETOPS and why the 777 has only two engines.

    sPh

  • Others have already covered the "1000 users isn't much" aspect. Benchmark, and verify what each server can handle of your anticipated load, but they're probably right.

    Option 1: Don't do it yourself. Look into renting servers from a hosting company. They will often provide HA and load balancing for free if you get a couple servers. Also, having rented servers makes it much easier to scale. If you find that you have 100,000 uniques per day, you can order up a bunch more servers and meet the load within

  • by amaura ( 1490179 ) on Monday March 02, 2009 @07:07AM (#27038711)
    If you're looking for a lightwheight open source loadbalancer with a lot of features, go for HAProxy. In my company we work with F5 Big IPs, Alteon, Cisco CSS which are the leading load balancers from the industry, they are really expensive and depending on the licence you buy, you won't have all the features (HTTP level load balancing, cookie insertion/rewriting). We first used HAProxy for POC and now we're installing it in production environnements, works like a charm on a linux box (debian and RHEL5) with around 600 users.
  • One more thing. (Score:5, Insightful)

    by OneSmartFellow ( 716217 ) on Monday March 02, 2009 @07:12AM (#27038745)
    There is no way to be fully redundant unless you have independent power sources, which usually requires your backup systems to be geographically separated. In my experience, loss of power is the single most common reason for a system failure in a well designed system (after human error that is).
  • === Reading about Windows 2003 Clustering makes the whole process sounds easy, while Linux and FreeBSD just seem overly complicated. ===

    Well, yes, that is how Microsoft makes its money: by releasing versions of complex technology that seem easy compared to the archaic legacy technology. Key word there is "seem", of course; when the chips are really down you will find out if (a) the Microsoft system was as good as, or even the equivalent of, the "archaic" version (b) your deep understanding of the problem

  • CentOS/HA (Score:5, Informative)

    by digitalhermit ( 113459 ) on Monday March 02, 2009 @07:37AM (#27038867) Homepage

    It's fairly trivial to install RedHat/CentOS based clusters, especially for web serving purposes.

    There are a few components involved:
    1) A heartbeat to let each node know if the other goes out.

    2) Some form of shared storage if you need to write to the filesystem.

    3) Some methood of bringing up services when it fails over.

    A web server with a backend database is one of the canonical examples. You'd install the heartbeaat service on both nodes. Next, install DRBD (distributed replicated block device). Finally, configure the services to bring up during a failure. The whole process takes about an hour following instructions on places like HOWTOFORGE.

    But 1000 visitors a day is not much. It's small enough that you could consider virtualizing the nodes and just using virtualization failover.

  • by Anonymous Coward

    There are way to many questions that need to be known before a competent technical architect can help design the "just right" solution for you.

    Most of the people here are experts on some small part of the solution and will spout "all you need is X" - and that's fine for free. I've worked on telecom - can never go down - systems for over 10 tens as a technical architect leading project teams from 1 to over 300 software developers and 20 others on the hardware side.
    On the surface, FTP and web pages don't sou

  • With 1000 users if you want SQL Server you need to purchase a processor license: 5k$/CPU for Standard Edition, 25k$/CPU for Enterprise. (You only license physical CPU, not cores or hyperthreading). Add the Windows license (6k$). And you have no hardware yet.

    The "good news" is that with failover clustering (which is all you need cause 1000 users does not require load-balancing), Microsoft requires licenses only for the active node. And the failover node can be cheaper hardware, as it will run only under abno

  • Linux over complicated...ha ha

    I will sell him a system fully capable of handling ten times that traffic with hot standby failover for 50 bucks a month with ds3 bandwidth available to it.

  • Use CARP (Score:3, Informative)

    by chrysalis ( 50680 ) on Monday March 02, 2009 @08:17AM (#27039051) Homepage

    CARP is a protocol that does automatic load balancing and IP failover.

    Install your application on 2 (or more) servers, give them the same address virtual IP address using CARP, et voila. Nothing more do buy, and no need to install any load balancer.

    CARP's reference implementation is on OpenBSD, and it's shipped by default. DragonflyBSD, NetBSD and FreeBSD ship with an older version.

  • Google (Score:4, Insightful)

    by Danathar ( 267989 ) on Monday March 02, 2009 @08:39AM (#27039211) Journal

    Use Google. Why spend all that money buying up equipment for a non-profit that could be spent on your REAL mission.

    Do it in Google sites and dump the data center. I even think google offers google apps for free to non-profits.

  • by Zero__Kelvin ( 151819 ) on Monday March 02, 2009 @08:53AM (#27039339) Homepage
    Everybody keeps saying that 1000 unique visitors is peanuts and starts talking about Apache, etc. The OP mentions FTP as well, and didn't say if those 1000 users will all be regularly FTP'ing megabyte files or if they will be almost exclusively using HTTP with the occasional FTP download. If the former is the case, without analyzing it too much, it seems like this would be too much traffic for a single server to handle, no?
    • Re: (Score:3, Insightful)

      by TheSunborn ( 68004 )

      No not really. Any new server should be able to handle atleast 300Mbit/s.
      (And most likely also handle a full 1Gbit/s but that might require a dual cpu system, with a fast disk subsystem)

      The only way that 1000 users/day can require more then one server to handle the load, is if each user request require multiple complicated database query to reply to.

      (Or if the design/implementation should be featured no "the daily wtf").

    • by socsoc ( 1116769 )
      Yeah, I'm a little confused as to why a self-help archive needs to run an ftp.
    • For a small site, your FTP is going to be limited by your bandwidth LONG before it's going to be limited by your hardware, so as your consecutive downloads increase, the load on your system will decrease as the available bandwidth gets eaten up.

      I've seen FTP sites that ran a thousand concurrent connections on repurposed desktops. FTP is very lightweight in terms of processing. Your limitation is always bandwidth.

  • It's a good thing you didn't link to your existing site if you're worried about 1000 visitors a day...

  • I've set up Apache and mod_proxy_balancer [apache.org] for just this purpose. The sites don't have enough traffic for me to justify buying an F5 or Cisco CSS load balancer, so I use proxy balancer with a bunch of vhosts, it works great.

    Add Keepalived [keepalived.org] and you can have redundant (though not stateful failover) load balancers on the super cheap.

    For SSL it still works well. Give it a look, took all of an afternoon to set up a failover pair of servers. I don't know yet how much traffic it will take, but a single CPU
  • you may have heard "good fast cheap... pick 2" this is similar.

    if your content is dynamic, you have more to worry about. DB servers, storage other application specific issues...

    if your content is static or close to it, round robin DNS is plenty. rsync between 2 boxes, and set up the round robin. How far away the boxes are determines how long they take to sync and how much of a safety net it really is. next to each other in the same rack protects from HW fault. different datacenters protects from powe

  • Heartbeat [freshmeat.net] + HAProxy [freshmeat.net] + nginx [freshmeat.net]. We're using a combination of these to replace our aging BIG-IP setup. HAProxy does the actual HTTP load-balancing, whereas nginx is serving up all the static media (pics, etc).
  • F5 is the choice (Score:2, Informative)

    by russg ( 64596 )

    If you haven't looked at the F5 product line you should. The ability to use TCL language to write "iRules" and shear performance of even the smallest device is amazing. The devcentral.f5.com site is also great and allows you to gain from others experience. With an F5 in front the rest of the systems behind can be simple and cookie cutter with no complex setup. The F5 will handle persistence, load-balancing, and once you have your setup you can forget them for the most part.

    For the FTP server part, you j

  • We use OpenBSD [openbsd.org] with CARP and pfsync [openbsd.org] and relayd(8) [openbsd.org]. It works a treat load balancing our web and jabber servers. I highly recommend it and the documentation that comes with OpenBSD is second to none. It's also an extremely secure OS for firewalls and routers.

  • Grab a crappy old athlon tbird box with a gig of ram and set it up as a router/firewall running *LVS (Linux Virtual Server) to forward web requests to your back end web server. You can start out with one web server and gauge the load. If you want to scale the system, add more backend web servers and configure LVS with the new backend ip addresses.

    For redundancy on the athlon router, trunk a couple nics for network, and boot from cdrom (knoppix) if you are worried about system disk failure. You could also

  • as a noncomputer specialist /. reader, this whole conversation sounds really wierd.
    Why can't i just call up a bunch of guys in the yellow pages, or whatever passes for yellow pages, and say, I got a 1000 users a day, yadayada, gimme a quote.

    all this arcane stuff - you have know this program, that program, why should some small nonprofit even have to think about it

    to put this in perspective, it is as if the original poster was the maintenance guy, and he was asking for what type of capacitor to install in th

  • Buy an appliance (Score:3, Informative)

    by hitchhikerjim ( 152744 ) on Monday March 02, 2009 @10:32AM (#27040387)

    Your needs for 1000+ uniques are minimal. If I were to do it, I'd get a shared hosting account someplace and move on. Shared hosting can handle *way* more than that.

    But if high availability (limited downtime) is part of the requirements, I'd say go out and buy an F5 BigIP. You plug your internet in the front, your machines in the built-in switch, configure your domain names in it using the web interface, and you're done. Set it to do service-checks, and it'll automatically pull out of the pool any machine that fails or that you take down for maintenance. So you get full up-time so long as your power and network don't fail.

    Yes, you can get the same functionality using Linux HAProxy. But you sort of need to understand what you're doing. Reading the way your question is asked, I suspect you're learning this, and do you really want to make the mistakes on a real live project? Just go with the appliance until you have a solid understanding of what you're doing. Shoot -- I have a good solid understanding from years of experience, and I still use the BigIP when I have a budget (and HAProxy when I don't). It's just easier, and I can move on to more interesting problems with my time.

    Once you've got this setup, set up a cron job to rdist the site to all the machines so that all your data is always on each machine. If you've got a database, you have some choices. For completely static data, I like to have it replicated to each machine, and have each web server just query localhost. If it's dynamic, have a replicated pair. At your levels, that can exist on the web servers.

    I really dislike the cross-mounted disk architecture of traditional cluster solutions, because there are too many shared components. Each of those multiplies your possible points of failure for your whole setup. Better to keep everything completely separated, so if one component fails, that whole machine just drops out and the site keeps working because of the load balancer and because each machine can operate by itself.

  • by yamla ( 136560 ) <chris@@@hypocrite...org> on Monday March 02, 2009 @11:03AM (#27040807)

    The requirements to handle 1000 unique visitors/day will depend on what exactly you are serving. I ran a website that got well over 1000 uniques per day on a Pentium MMX 200 Mhz with 64 megs of RAM and a 1.2 gigabyte hard drive. This was significantly overkill for the site. However, that was entirely static content. Oh, except it handled email, spam filtering, and a database for a POS system for a retail establishment with two stores.

    If you are serving mostly dynamic content, you'll want more processing power and more RAM. Almost certainly, you'll be fine with a bottom end computer, but you probably want something manufactured in the last five years or so. This will obviously depend on what your dynamic content actually is, though; more complexity will require more processing power.

    If you cannot afford any outages, you may want to look at redundant hardware, failover systems, etc. etc., but you first need to determine how much an outage will cost you. What if you have a 5 minute outage? An outage lasting an hour? Eight hours? A day? In any case, before you look at redundant hardware, you'll need a service level agreement from your ISP.

    And of course, if you are looking at something to stream 1 gigabyte of traffic to each of these thousand uniques, that's a whole different matter. Now you may want to look at content delivery networks, and possibly multiple servers just to handle the outbound network traffic.

    No matter what your requirements, though, you need to look at a good backup solution.

  • Keep it simple (Score:3, Informative)

    by Jim Hall ( 2985 ) on Monday March 02, 2009 @11:57AM (#27041433) Homepage

    "I am working with a non-profit that will eventually host a massive online self-help archive and community (using FTP and HTTP services). We are expecting 1,000+ unique visitors / day. [...]"

    Others have pointed this out to you, but 1,000 visitors is not much load at all. I work at a large university, and during registration first day of classes, we have 500 unique users (what you call "visitors") in each hour. On the first day of classes, we may get 1,000 unique users per hour as students look up their class schedules, and sign in to the registration system to drop that stupid class they were just in. We run a load balancer at the network level, so that traffic is balanced immediately at the switch, rather than at a host level before being sent to a back-end web host.

    But doing the same in your case will be very expensive. If you work at a non-profit, you probably don't have this in your budget.

    If you're just doing simple http and ftp (that is, not running a web application with a database back-end .. or an application that keeps "state" on the server, requiring users to always go back to the same server server they first visited) then you might consider the simplest solution of all: DNS round-robin. Simply put, you enter the IP addresses for two web servers (or ftp servers) for a single www entry in DNS. At the expense of hitting your DNS more frequently, you could set the TTL to 1 hour for the round-robin so that if server #1 went down, you could push an update to DNS so "www" just points to server #2, and users are only inconvenienced for about an hour.

    But your best solution is probably just to outsource this, especially if you're only doing simple http and ftp. A good web hosting company already has this infrastructure available to you. No need to re-invent the wheel for just 1,000 users.

On a clear disk you can seek forever. -- P. Denning

Working...