Why Is Less Than 99.9% Uptime Acceptable? 528
Ian Lamont writes "Telcos, ISPs, mobile phone companies and other communication service providers are known for their complex pricing plans and creative attempts to give less for more. But Larry Borsato asks why we as customers are willing to put up with anything less than 99.999% uptime? That's the gold standard, and one that we are used to thanks to regulated telephone service. When it comes to mobile phone service, cable TV, Internet access, service interruptions are the norm — and everyone seems willing to grin and bear it: 'We're so used cable and satellite television reception problems that we don't even notice them anymore. We know that many of our emails never reach their destination. Mobile phone companies compare who has the fewest dropped calls (after decades of mobile phones, why do we even still have dropped calls?) And the ubiquitous BlackBerry, which is a mission-critical device for millions, has experienced mass outages several times this month. All of these services are unregulated, which means there are no demands on reliability, other than what the marketplace demands.' So here's the question for you: Why does the marketplace demand so little when it comes to these services?"
The cost (Score:5, Interesting)
Really so common? (Score:5, Interesting)
Re:The way it has always been (Score:5, Interesting)
In truth, most consumers won't complain when they should, so there is no marketplace pressure on those businesses to aim for five nines uptime.
Re:because they've been conditioned (Score:5, Interesting)
Regulation has different economics (Score:-1, Interesting)
In the telecom industry, the result of deregulation is customer annoyance. In the aviation industry, on the other hand, deregulation produces greater danger to the flying public. In fact, air travelers are indicating that they have had more than they can stomach. http://strandedpassengers.blogspot.com/ [blogspot.com] We are seeing a grass roots movement that is forcing legislatures to enact legislation. As it stands, an airline can confine passengers forever in a plane on the tarmac. Even the cops can't do that legally (ok but I realize that they do it anyway). They can also make a healthy profit selling your lost luggage for more than they paid you for losing it.
The crap we have to put up with from our airlines, telcos and ISPs will only stop when the customers rise up and force their congress critters to act. (This turned out to be a rant, didn't it?)
Re:Here's an easy one. (Score:5, Interesting)
Gas Prices? (Score:3, Interesting)
Re:Really so common? (Score:3, Interesting)
Re:because they've been conditioned (Score:3, Interesting)
Reality Check (Score:5, Interesting)
I was born in 1964. I have no recollection of POTS telephone service ever being unavailable.
Electricity was expected to drop out a few times every summer, and until someone figures out how to tell lightning where to go, I expect it will continue to happen. In my part of Canada, however, power is continuously available from October to April no matter what. Even if you don't pay your bill. The only winter power outage of note I can think of offhand was the great Ice Storm of 1998 [wikipedia.org], one of the most spectacular cases of force majeure I've witnessed in my life.
In my part of the world, at least, power and telephone were life-and-death services and legislation mandated their reliability.
Re:The cost (Score:5, Interesting)
We always want to compare service levels for newer tech with POTS and complain when they don't approach the same levels, but I'd expect that if we were to be still using the same equipment for ISP/Cellular service in a hundred years, it would be as stable and robust as the current (ok, previous generation) iteration of POTS. Problem is, we are constantly demanding better, faster, and cheaper: this has to be traded off for reliability, and for the most part people are happy with that tradeoff. Just like we're happy to buy crappy consumer goods from China at Wal*Mart because they're cheaper than domestic products.
WTF would I do with 99.999% uptime? (Score:5, Interesting)
My consumer grade equipment isn't 99.999% uptime (with luck, maybe I guess but there's no ECC, redundant power etc).
My software isn't 99.999% uptime (ok, so the kernel is stable. When X crashes, so does everything of importance on a desktop)
If there's something urgent, you CALL me anyway.
I'd rather take a line with 99.5% uptime (that's two days without internet per year) that's 10x faster and costs 10x less. Which doesn't include that I have Internet at work, or via my cellphone, or via a webcafe or any number of other easily available sources. The only real killer I can think of is if you only telecommute and can't go to work, but even then I figure the nearest Starbucks will let you occupy a corner with some purchases.
Partly correct (Score:3, Interesting)
I don't think they measured squat. Just did their best. Only thing was that there were nobody who could properly design an O/S and complexity, instead of simplicity, ruled the day.
What we are seeing is the very best they as group are able to produce.
They have never been great at marketing either. But they were really the first to push the GUI with success. Don't forget Apple became a very closed platform. They did not attract masses the way the open IBM PC did.
Right there history shows how important open standards are to success. Apple was considered this fantastic success story but in reality they cut it short and did not buy the masses the way the Johnny came lately IBM PC did. But we are slow when it comes to learning from history.
What they have been good at is market lock-in, vender lock-in and many other types of lock-in. (The problem really is that they had never heard about duty and were only interested in money.) We all thought they would get it right sooner or later and deliver a good platform that would allow happy computing. The fact that they specialized in adopting good standards and then corrupt them so that you got locked in was a very calculated development.
At one point Gates himself said that Unix was the way to go. Then he decided to do it better but clearly never understood what made Unix so good (simplicity). Torvald on the other hand was ONLY looking for simplicity. Which is why it fit so well into the general Unix design.
Look at windows, it is filled with arbitrary complexities and is horribly inefficient. Never mind when upper management throws fits and yell at staff, I've never found that conducive to good programming, or business.
Gates cheated his way into O/S design, used people from VAX who's memory management problem were dragged over to windows. Built a kernel in BASIC! Haha! And got away with it for years!
Someone who knew more about systems picked the Unix design and rewrote history based on technology, and was not motivated by money. Interesting to see how much we like to be able to just do what we need. Imagine if IBM had released Linux. With all the corporate support for let's say $100. Then opened it up with a GPL license.
Microsoft would not be sitting pretty at all. The O/S2 collaboration would not have happened and Gates would not have learned his lessons from that. For all their success I've never considered them much of a success where it really matters. Integrity in product and care for customers. I have people send me Brandy, fine wines and other tokens of their appreciation after sales. Because I believe in treating other people the way I like to be treated, and I really care about my clients.
Re:because they've been conditioned (Score:3, Interesting)
Frequent reboots haven't been required since win2k.
(snicker)
I've been running windows for years, and this statement is just very funny to me. You must be running some entirely different magical version of windows that I've ever seen, but reboots are EXTREMELY common on 2000, XP, and Vista. The "just reboot" instinct I've seen from multiple different Windows guys is common, and DOES work. I was looking forward to Vista, which claimed it didn't require rebooting as often. That didn't really turn out to be the case. If you really think win2k and beyond doesn't require reboots, I think you either don't run it, or just have a very poor memory.
Re:What is good enough? (Score:2, Interesting)
Maybe that's the question the cable company would like to ask, but the one concerned consumers should be asking is, "how do you get someone to expect _more_ for the same price (or less) when they think that what they currently get is good enough?" Reading your piece of the discussion, I think this question could also follow, and it happens to be the original question...
Would I be willing to pay more for cell service that had fewer dead zones, dropped calls and "busy networks" then my current one has? No way. It's not as good as landline, but it's good enough for me. If, ten years from now, it worked the same as it does now, I would expect their competition to have passed them by and I'd switch. In the US we're in a free market system.
If I was tired of my cable internet dying on me occasionally, which competitor would I turn to? DSL, satellite and local wireless all have problems too. I settle for less than 5 nines because I have no choice, if I want service that is anywhere near the cost it is right now.
Re:Reality Check (Score:2, Interesting)
Or, what about the nifty Verizon cell outage that affected most of the south of the US for 8 hours 6 months or so ago? Or the network issues in the middle east? Rolling brown/blackouts in Ca and the NE of the US?
There's not a lot you can do when the entire area is covered in 6 or more inches of ice with heavy winds, or if every goes under a few feet of water.
Re:because they've been conditioned (Score:2, Interesting)
Re:because they've been conditioned (Score:5, Interesting)
The origins of an OS really show through a lot of the time. Windows started out as a single user OS, so rebooting was OK because the only person you messed up was the guy sitting in front of the screen. It eventually evolved into a multi-user OS, but the "just reboot!" mentality persists to this day.
Windows NT (ie: contemporary Windows) has been a multiuser OS since it's first release.
The reason the "just reboot" mentality persists is simply becaus e99% of the time it *is* used as a single-user OS, and no-one else is impacted. This has _zero_ to do with the architecture and everything to do with the user. Linux would be (and is) treated in the same way in similar situations.
Linux/Unix on the other hand started out life as a multi-user OS. Rebooting was a big no-no, because you'd affect countless people logged in, and you'd get yelled at for ruining someones work.
UNIX actually started out as a single-user OS and the multiuser aspect was bolted on later. Linux didn't, of course, because by the time Linus banged together his UNIX rip-off, UNIX had been multiuser for quite a while.
However, again, the attitudes towards how their relevant users treat servers and workstations have about 10% to do with their architectures and 90% to do with their knowledge. DOS and OS/2 were single user, yet frequently had BBSes and similar running off them. You can be assured the people running those BBSes were far less like to have the "just reboot" mentality.
Further, the other reason most people have that attitude is because to them a computer is just another appliance. When other appliances act up, pretty much the first thing _everybody_ does is turn it off and back on again. Why on Earth would you expect them to treat a computer any differently ?
Windows administrators categorically will try rebooting the damn thing first to fix any problem (and it usually works). Linux administrators will only try this as a last resort (and it almost never works).
No. Inexperienced admins will try rebooting first, regardless of platform. Experienced admins will not. Incidentally, there are numerous classes of problems on Linux (and UNIX in general) which are more quickly and easily "fixed" with a reboot.
Anyway, at Microsoft the idea that you can somehow tweak windows just right so rebooting isn't necessary is crazy.
I can't even remember the last time I had to reboot any of my Windows machines without a good reason (eg: patching).
Finally, there's nothing wrong with rebooting _anyway_. If your service uptime requirements are affected by a single machine rebooting, your architecture is broken. All the reboot does is demonstrate that it's broken without a real problem actually occurring.
Sysadmins comparing machine uptimes is like ricers comparing spoilers.
Here's your citation about email (Score:3, Interesting)
[citation needed] I call bullshit on that one.
And I call BS on your BS. Clearly you're not familiar with the state-of-the-art as far as email goes. You've certainly not had to set up and run a private email server.
Here's one good reference [realfreewebsites.com]. It mostly mirrors my experience, except that it's been going on longer than the writer has observed.
The basic problem is that Yahoo, Hotmail, ATT and other large email providers, or ISPs, simply refuse to honor the standards which have been published (DKIM, et. al.). Google is great. But it's gotten so bad with the others that I simply don't bother communicating with anyone who has a Hotmail, Yahoo, or ATT account. If they are someone important, I'll tell them once (via a different band) of the situation. And let them know that unless they change their email provider, I won't be responding to any future email from them.
Usually I just refer them to gmail, because google seems to be the only large email provider with a technical clue.
The other interesting thing is that all of these large companies will treat unsigned email from an Exchange server as more verified than a DKIM email, but I digress.
Supposedly the excuse is that it's due to spam. I'm certain that is part of the problem. But the other part is that there's definite incentive for the big boys to eliminate the small independent websites and drive all of the business into their arms.
So, yes, the OP's statement about many email messages not reaching their destination is quite true. Most? No. But anything that doesn't use the technology offered by the big commercial joints (including Microsoft server technology) is shut off from communicating with a large part of the internet.
Blackberry is not a mission critical service. The people who use it as such are naive.
Heh. Well, many PHBs would disagree, but your point is valid.
For your amusement, the Blackberry email servers are provided by a company called Mirapoint (mirapoint.com), and they are Linux based. From what I've heard, they cut over about 2 years ago from BSD to Linux, for various reasons. I'm also told that the CEO is a complete airhead who has difficulty managing a secretary, let alone a company. But that the mid-level managers and engineers in the U.S. are first rate. I imagine that they could indeed improve the uptime of the email servers, but those servers are quite good already.
Multiple options and information overload. (Score:-1, Interesting)
The first is that people, like the Internet, have multiple options and can route around obstacles. Blackberry's out? use email or IM. Internet is down? Use the phone, etc.
And the second reason pertains to what happens when all of the options are out -- it's an excuse to take a break! Sufficient unreliability provides a built-in respite source; the network is down, take a moment to meet some co-workers, or go outside and bask in the radiation of that ongoing stable thermonuclear explosion that so many of us forget is there....
Re:Reality Check (Score:5, Interesting)
First responders are police, paramedics, firefighters, etc. There was an incident about a year ago where two cops were being assaulted (and losing the fight) in a basement. Their radios were not working, so they couldn't call for backup.
Luckily for them, a bystander called 911 on their cell phone.
Lucky for me, too, since I got called to the carpet for calling the reliability of the system into question. I probably would have been fired, but the above-mentioned incident was in the paper the morning of my "meeting".
The new radios are controlled by internet-connected computers. As the Farkism goes, "this should end well."
Re:Reality Check (Score:3, Interesting)
I have always had several telephone service failures per year, every year, for the last several decades, where I live here in Northern Arizona. First of all, when it rains, the telephone lines sometimes become wet and I loose my dial-tone for a day or so. Then, when I call the telephone company, they usually say, if your telephone lines have not dried out and started working within 48 hours, we will send someone out then then. Can't they figure out how to water proof the phone lines and boxes and other stuff?
Nearby lightning strikes during thunderstorms also cause several brief power and telephone service failures every summer. The power and telephone service failures usually last anywhere from several minutes to an hour or so. In two instances, my telephone was destroyed and in one instance the twisted pair telephone line itself in the building was damaged. Fortunately, I had already unplugged my computer, in those instances.
Then of course, about once every other year or so, a backhoe causes a several hour loss of telephone service. Then about a year or two ago, several nearby telephone poles snapped during a wind storm. Then about once a year, telephone and/or power briefly fails for reasons that are not obvious.
I always keep several LED flashlights and a battery powered radio handy just in case, especially during the summer. My backup methods of communication are my cell phone and the 2-meter ham radio in my truck. By the way, we do not have tornadoes, hurricanes or ice storms here.
Re:because they've been conditioned (Score:3, Interesting)
[1] If so, let me forward you my resume for your consideration
Re:because they've been conditioned (Score:3, Interesting)
It's funny the attitude that comes from the users of each OS. Windows administrators categorically will try rebooting the damn thing first to fix any problem (and it usually works). Linux administrators will only try this as a last resort (and it almost never works).
It's even less than a last resort. I have, once or twice, had true problems that required a reboot of a Linux machine to fix. The one in most recent memory, it took three weeks before realizing that a reboot was (or at least, could be) the solution. That's three weeks of hard core debugging, tweaking, and hair pulling. The idea of a reboot to fix a user-level software issue is not something that even remotely crossed my mind, nor anyone else's. In fact, it was a Windows user from another location who ultimately made the suggestion "Have you tried rebooting it?"
Rebooting a computer to fix a problem should be viewed with the same suspicion as burning down your house to eradicate an infestation of insects.
Here's Why America Puts Up with It (Score:3, Interesting)
My landline is up 99.999% meaning my phone is available to me when I need it 19.998% of the time.
I'm out and about (and coherent) 40% of the time.
My cell phone works 90% of the time meaning it is available to me when I need it 36% of the time.
Clear winner, cell phone.
Sometimes we lose site of reality while studying statistics.
Re:The way it has always been (Score:4, Interesting)
Back in the day, I was an IRC Operator for a large Undernet server, and there came a time where the new thing for troublemakers was to use open proxies on cable connections to flood channels/servers. One cable provider had a particularly large number of clients whose setup was used to attack the network and generally cause trouble.
At first, being in the area of that provider, I called tech support and escalated the issue as much as I could. My point was that they were ultimately responsible for the abuse coming from their network. Long story short, for months I got nothing but "we'll look into it".
After a particularly nasty week, and after consulting with the server admins, we decided to ban the whole ip range of that provider from using our server (they could still use the rest of Undernet, but our server was popular for them). The ban kicked > 1000 clients from the server with a message like : Your provider does not respond to abuse complaints. Contact your provider's technical support to have this issue resolved.
10 minutes later, there was a 30 minute wait at the provider's tech line. On a sunday afternoon. One hour later, I got an email saying they were blocking inbound port 1080 at their router to protect their clients machines from being abused.
I guess the point is, when something generates enough backlash, preferably with a nice surprise effect, things can change. The hard thing is to organize people enough to harass the company about it.
Re:because they've been conditioned (Score:3, Interesting)
No, it should be viewed as fumigating your house. You all move out, wait a few days, then move back in. When you reboot you don't lose the computer, you don't lose the archived data, and all the users can return in a short amount of time.
Burning down your house loses all the contents and ensures you'll never return...
Re:The cost (Score:3, Interesting)
Re:Here's an easy one. (Score:2, Interesting)
Re:Reality Check (Score:3, Interesting)
Unfortunately nobody seems to realize just how much money goes into one radio tower.
In my county we had decent coverage even though we were the second largest county in the state, but had the fewest number of people.
We had three towers serving the entire area. Each one cost around $100,000 to put up. That covered the price of the equipment, the man-hours to install it, the equipment hours to fly it by chopper to the mountain top, the price of refueling the propane tank so the sunlight-poor winter months wouldn't shutdown the repeater.
Sure, it's easy to say we need 99.99999999999% reliability, but who wants to pay double or more for the redundancy?
Hell, just reprogramming all the radios in the county to support another repeater would cost $10,000. Not to mention if you wanted 99.99% uptime, you'd probably have to purchase a second set of radios for the responders because the place that handles reprogramming takes about a week.
The more reliability, the greater the cost.
At least until someone replaces the proprietary windows-only dispatch computers, applications, and processes with linux. Then you just pay for the hardware...
Re:because they've been conditioned (Score:3, Interesting)
I have to agree.
I've stated before 'Every 9 of reliability increases the cost 10 fold'. Now, this is only the vaguest estimate, with vast numbers of variables, unseen incidents, competency, etc...
Take a car that's 90% reliable. It'd be used, of course, and probably cost you only $100-500. You can get a car that's 99% reliable for $1-5k. 99.9% reliability would be getting into needing a new car(or newer used), costing $10-40k. This, of course, discounts getting a lemon.
Now, when it comes to phone service it's reliability comes from that stuff has been done for so long that the extra reliability doesn't actually cost 10X, plus the base '90%' is so cheap that upping it to 99.9% isn't very expensive.
Re:Partly correct (Score:3, Interesting)
Re:because they've been conditioned (Score:3, Interesting)
and I can tell you that in the post-windows days... well, people have this concept of rebooting when things don't work. "it will auto-magically fix itself" (tm). cell-phones, managed switches, home routers... you name it, the first thing tech-support will do is ask you to "turn it off and on again". so much so that that is a standard gag in "the IT crowd".
i had this incident in our data center where this nincompoop kept futzing around with a managed switch. he hosed the config, caused some ripple effect on the servers and then panicked and wanted to reboot everything - including the servers. didn't know what the problem was but as he is indoctrinated to the ethos of rebooting to automagically fix problems, just wanted to reboot everything.
i had to step in, restart a few services and things were back to normal. no reboot required. a reboot would have taken us out for a good 15-20 minutes. restarting services, 10 seconds.
it's almost like people don't take pride in uptimes. who cares if it's down for 30 minutes... thanks largely to the microsoft OS culture. unix was bad enough - compared to mainframes and VMS - or so i'm told.
so, yeah, it's gone downhill. MS didn't help. telephones might have had outages but i sure don't recall having to reboot the big black rotery dial phones..
Re:Reality Check (Score:2, Interesting)
Cut the cable in 3 places; the Bell crews were camped in the trench with tents over it for 4 days, splicing the mess back together.
Strangely, the contracting company that did it was never seen again....