Google Blames Gmail Troubles On Maintenance Goof 109
Slatterz writes "Google has apologised for the two-and-a-half-hour Gmail outage on Tuesday morning, and admitted that the cause was down to data center maintenance. 'Lots of people around the world who rely on Gmail were disrupted during their waking and working hours, and we are very sorry. We did everything we could to restore access as soon as possible, and the issue is now resolved,' said Gmail site reliability manager Acacio Cruz in a blog post. Google had been testing new code designed to keep data geographically closer to its owner, which brought about disruption when maintenance in one data center caused another facility to be overloaded. This had a cascade effect, according to Google, and it took the company an hour to get it back under control."
Mail time happy time. (Score:2, Funny)
Re: (Score:2)
This was not a planned outage, as a result, no one was told in advance, that sucks.
Re: (Score:2, Funny)
Problems with Jabber connections to GMail users (Score:4, Interesting)
Re:Problems with Jabber connections to GMail users (Score:5, Interesting)
Re: (Score:2)
Re: (Score:2)
Re:Problems with Jabber connections to GMail users (Score:5, Insightful)
Been this way since at least last Thursday (Feb 18) for me. I have several contacts ($grandboss, $director (who's out sick), and $wife among others) that insist on using GMail/GTalk, all of them went "remote-server-not-found" last week, with no changes on my end. As a lark, I restarted my XMPP server, without it making a difference. If I had to guess, server federation was deactivated on the Google end, out that's just a WAG on my part.
Re:Problems with Jabber connections to GMail users (Score:5, Funny)
see what i did there? i mixed some code in appropriately (= instead of is saves me one character)
Which you promptly wasted by explaining what a jackass you are...
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
There have been some configuration issues in that area recently. If you still have that problem, take a look at the archive of the operators list at xmpp.org [jabber.org].
You might want to join that list if you're an XMPP server admin anyways.
So you're saying... (Score:4, Funny)
damn marketing bs...
Re: (Score:1, Offtopic)
Thanks a LOT Google... (Score:5, Funny)
Re: (Score:2, Funny)
it's ok, some poor lady got an iphone that should would have never been able to afford otherwise. cause she's poor.
"Maintenance goof?" (Score:5, Funny)
I mean, sure, if the janitor brought down the service, that's pretty bad, but it seems a bit harsh to start calling him a "maintenance goof" ...
(tip your bartenders and waiters)
Re: (Score:2)
It's almost like.. (Score:5, Funny)
.. Gmail is Beta or something.
This is why we're still beta. (Score:5, Insightful)
Re: (Score:1, Funny)
Re: (Score:2)
I pay for my (business) GMail account, which definitely does not have that label, but still I was down. Oh well, I guess they'll still make the promised uptime.
Fast-forward 100 years... (Score:5, Funny)
Re:Fast-forward 100 years... (Score:5, Insightful)
It's a FREE service. I don't have a problem with an outage when the service is free. It's when I pay for a premium service, they can't keep it stable, and finally raise my rate to cover their idiocy that p*sses me off.
Re:Fast-forward 100 years... (Score:5, Insightful)
Re:Fast-forward 100 years... (Score:4, Funny)
It was the eastern seaboard. Most east coasters think the Pacific ocean is just past Chicago, hence why they call it the "mid-west" instead of "Almost-Central"
Re: (Score:2)
Busy Cali
Re: (Score:1)
Re: (Score:3, Interesting)
How did you calculate 4 nine's for gMail? 4 9's is 52 minutes of downtime per year, while this outage was over 2 hours.
And this isn't their first outage. The last one I remember was April of 2008.
Is it even possible to measure 6 9's of downtime for an internet service? 6 9's is just 30 seconds of downtime per year -- less than 3 seconds per month -- 100 msec/day. Can you honestly say that you never experience 100 msec of additional latency once a day? Maybe once a month they have a hard disk timeout that m
Re: (Score:2, Insightful)
And this isn't their first outage. The last one I remember was April of 2008.
Having seen forum threads around the internet discussing gmail downtimes in the past a general trend is, that only one or two persons see an outage, everybody else can access gmail just fine. That makes me think the majority of their downtimes only affect a tiny fraction of their users. If you count all outages even though they affected maybe just 1% of users, then you are not giving correct availability figures. If 1% of the time there is an outage for 1% of the users, the availability isn't 0.99, it is 0.
Re: (Score:2)
Re: (Score:2)
We're 5 9s. If you think GMail is 4 9s I have a bridge you might also be interested in.
Re: (Score:2)
I agree. I have another, pay, email account with netidentity, owned by tucows.com, which has had all kinds of reliability problems for the past several years. They've had several occasions where my email was inaccessible for several DAYS. Now I just forward that mail to my Gmail account, which I've never noticed a problem with.
Re: (Score:1)
That's all very well, EXCEPT people DO PAY for Gmail.
There are lots of corporate/paid accounts who WERE paying for it, and have an SLA (service level agreement) with Google, and they were just as affected as everyone else.
Re: (Score:1)
Beta = Test Environment (Score:4, Interesting)
Re:Beta = Test Environment (Score:5, Informative)
Re:Beta = Test Environment (Score:4, Insightful)
Re:Beta = Test Environment (Score:4, Insightful)
Re: (Score:2)
The fact that they have corporate accounts paying for access to the service should preclude the 'beta' label. I like a lot of what Google has done, but sometimes it seems like the whole beta thing is just a convenient excuse for failure, or as a free pass for iffy behavior like testing in production.
It's just a label. It doesn't mean anything other than "we're not finished with this yet". And when did they use it as an excuse?
Re: (Score:2)
Re: (Score:2, Insightful)
Be thankful that, at least, Google calls their testing versions "beta", not "Sevice Pack n" | n < 2.
Re:Beta = Test Environment (Score:4, Informative)
I don't think they are testing it on their corporate users. My domain is signed up for google apps which includes email, but not the pay for premium version. When I read on slashdot that gmail was finally adding an option for 'always use https connection', I looked in the options where people said it would be, and found nothing. Logging into the "official" gmail I was able to find it right away. It took some time before it showed up in my domain's gmail client.
My conclusion is they test all the code on the official gmail users to make sure it's stable enough before updating the corporate clients etc.
Re: (Score:2)
Re: (Score:3, Interesting)
It wasn't 2.5 hours for me - it was more like 14-15 hours.
It stopped working at night time, around ~9PM (this is when Gmail Notifier failed to login, and curious, I tried to login manually). It wasn't working yet at 2AM in the morning. I went to sleep, woke up, and it was still broken. It finally came back online some time after lunch.
This would be quite irritating if I were a business. As it was, I did have some important emails to send off, but waiting a day didn't kill me.
Re: (Score:2)
Re: (Score:2)
The test environment doesn't have to exactly mimic the production environment. It just has to serve as a model. Lets say that in their test environment they move mailboxes around, and they find that it takes X minutes and Y amount of bandwidth to move Z amount of data. They can then take those calculations and extrapolate what will happen when the numbers change. We obviously don't have details involved, but the article mentions moving mailboxes and servers being overwhelmed by the amount of data moved.
A smaller parallel network (Score:2)
In the same datacentres as the production servers.
Or something like that.
Re: (Score:1)
My business depended on GMail, and yes it was down, but for a hell of a lot longer than 2.5 hours. It was more like from Monday night to Tuesday afternoon.
Yes, I do pay for their services. And no, we will not be depending upon it any longer. "Testing" code on a production environment is just bone-headed, and I am quite frankly getting tired of the constant "Some features have failed to load..." (... because we're testing new code that doesn't work) messages.
There are more reliable providers out there...
Re: (Score:3, Insightful)
Re: (Score:2)
You can test and test all you want outside of production, and any respectable shop will have every piece of code thoroughly unit tested and will test "significant" changes against simulated (for changes that load can affect) and limited users.
But, for an environment with huge infrastructure, it becomes literally impossible to test every scenario against real user loads with real user patterns ("random" requests is not real).
When your test scripts get timeouts, they gently retry after $TIMEOUT. People arent
Re: (Score:1)
They should apologize (Score:1)
for their lame layout - should give people a way to avoid (or change) the styled buttons, not all of us can easily read them now.
Re: (Score:1)
What about basic HTML mode? Or does that constitute 'disabling Javascript'?
Re: (Score:1)
Doesn't it?
My bad. (Score:5, Funny)
Re: (Score:2, Funny)
Re: (Score:1)
You, Sir, has single-handedly made the phrase into a permanent meme on Slashdot!
And oh, you must be new here.
Stop complaining people, it wasn't that big a deal (Score:5, Insightful)
First off, it's free, it gives you 7 Gigs of mail storage and it's accessible from any where or any device with an Internet connection.
It searches through my 4 years of e-mail faster than Outlook ( in cached Exchange mode) can search
the last week. They keep adding features - for free;
have no annoying Flash ads and the ones they do have are off on the extremes of the page.
If you don't like it, stop using them - I promise you there won't be any pesky cancellation fees.
Hotmail and Yahoo await you and we'll miss you all - maybe.
SLA HA HA HA (Score:2)
It's alot worse than you seem to know.
I've been having problems with gmail for 4 days now. My mail STILL isn't being delivered.
I have sent two emails a day (morning and evening) to my Yahoo account over the last 4 days.
None have been delivered. This still isn't fixed.
Re: (Score:2)
Sorry to hear that but it's not a universal problem - I've had no issues with Gmail and the only time that there was an outage that was long enough for me to notice was over 2 years ago.
All in all, their free service has been an order of magnitude better than the various Exchange environments I've been in ( including BP ( 2 years ago, HP ( currently) and several medium sized ISPs) in terms of both service speed / reliability, mailbox size, searching and spam filtering / virus scanning.
Of course, Google does
Re: (Score:1)
Except Gmail is NOT free if you are a PAID user and they were just as affected as everyone else.
Quityerbitchin! (Score:2)
I don't understand people who rely on Gmail, or any other free webmail, as their primary and business-critical point of contact. There is no SLA, no contractual obligation, no guarantee of anything. Anything can happen to your email and there's absolutely nothing you can do about it.
The logic is quite simple: if you can't live without something, then get a guarantee in writing, and pay the premium for that extra service. In Gmail's case, there is no premium service, so you'd better start looking elsewher
there's a reason they didn't say that (Score:3, Insightful)
They want people to use gmail, which is of course the reason they offer it in the first place. They make a significant profit off it, and would lose money if they drove away users.
Re: (Score:1)
Actually there is. Gmail is available as part of Google Apps for Your Domain [google.com]. Premier Edition costs $50 per user per year and offers a 3-nines uptime guarantee.
Re: (Score:2)
If you're paying for the premium version of Google Apps (which you should be it's a business-critical domain), you get a 99.9% SLA, which by my calculations they are still well within.
Despite that, they are giving premium users 15 days of free service. There aren't many service providers I know of who would just go ahead and do that - most of the ones I've worked with would point blank refuse, and some of them will make you fight to get se
Re: (Score:2)
I routinely give away full months of free service to my customers if they have a problem - even for tiny things like management interface issues that prevent them from doing something they should have. For major issues, the only real way to compensate a customer is to give them x amount of their money back. Google's "you may get 15 days" SLA is very, very weak. Purely my opinion; obviously many people think it's the most awesome generous thing a company could ever do. If you look around though, it's bottom
Re: (Score:2)
Maybe because it's more reliable than the non-free services?
I have a pay email account with netidentity, owned by tucows.com. They've had several outtages in the past couple years that have gone on for several DAYS. What exactly can I do about that? Sue them? Good luck with that.
Your guarantee in writing is worthless when something actually does go wrong. Your only recourse is to sue, and if you've ever used the court system, you'd know that you'll never get any money back that way. Only the lawyers p
Actually another problem may exist... (Score:3, Informative)
Re: (Score:2)
Bastards! They labeled one of the biggest webmail providers around as a webmail client?
Next they'll be labelling myhotteenpussy.com as a porn site.
Re: (Score:2)
You know that you can whitelist domains with OpenDNS, right? Or just not block the "webmail providers" category?
Re: (Score:2)
Business continuity plan? (Score:2)
Yes it's damn annoying when email or some other part of your critical infrastructure goes out, but this really should have been planned for in advance. Not by google but by you.
Things happen. Things that are out of our control but we still have to deal with them. This outage was quite short for most people. A day at the most from what I hear, but what if the outage had been longer? A week? A month? How would you have dealt with it?
I always keep a few lists of things to do, people to call, things to write sh
Data redistribution? (Score:1)
Too bad they don't tell more details. Their software can withstand lots of problems: network partitions, data center outages, failing routers, etc. This time, a new piece of of algorithm apparently did not do a very good job at redistributing data at the time of the data center failure. I'd like to know what it tried to do? Did it try to push too much data to one single location, causing that location to become unresponsive, in turn causing it to start redistributing data as well? I'm glad they didn't loose
Did you mean: apologized? (Score:1)
Re: (Score:2)
Depends which side of the pond you on, guvnor.
+1 informative
Re: (Score:3, Funny)
It can pretty much kick your ass and fuck your mum.
Just like your mum, Gmail supports millions of users.