Massive Google Cloud Outage Takes Down YouTube, Gmail, and Snapchat In Parts of US (theverge.com) 149
An anonymous reader quotes the Verge:
YouTube, Snapchat, Gmail, Nest, Discord, and a number of other web services are suffering from outages in the U.S. today. The root cause appears to be problems with Google's Cloud service which powers apps other than just Google's own web services. Google has issued a status update on its Cloud dashboard, noting that issues began at around 3:25PM ET / 12:25PM PT.
The issues appear to be mostly affecting those on the East Coast of the US, but some YouTube and Gmail users across Europe are also reporting that they're unable to access the services. Discord and Snapchat users are experiencing issues logging into the apps, and these both use Google Cloud on the backend.
The issues appear to be mostly affecting those on the East Coast of the US, but some YouTube and Gmail users across Europe are also reporting that they're unable to access the services. Discord and Snapchat users are experiencing issues logging into the apps, and these both use Google Cloud on the backend.
But is Pornhub still running? (Score:3, Funny)
Asking for friend.
Re: (Score:1)
And Canada. (Score:3, Informative)
And Canada apparently.
Good news is it isn't my isp dropping the ball for once.
Ooh the cloud goes down and everybody's fucked (Score:5, Insightful)
That's a surprise...
It's just stupid Youtube and Snapchat though. Nothing important. But just you wait till the data of the company you work for becomes inaccessible for any length of time, and then perhaps IT managers / bean counter / CEOs and other short-sighted "decision maker" will realize why the cloud is a Really Really Bad Idea [tm]
Re: (Score:3)
Re: (Score:3)
Re:Ooh the cloud goes down and everybody's fucked (Score:4, Informative)
It's all about uptime. Uptime of your local server room or your private rack in a datacenter is not necessarily higher.
High uptimes require redundancies. Redundancies which you can apply to the cloud much the same as you would to your own hardware.
Nothing stops snapchat from having a fail-over on a second cloud, like Azure or AWS.
One thing I do know: scaling that redundant cloud when it goes live is a lot easier and more cost effective than keeping redundant hardware up and running yourself.
Re: (Score:2)
Exactly. The amount of cloud haters on slashdot never ceases to amaze me. There are several workloads that aren't suitable for cloud, no doubt...but to make a blanket statement that cloud is a bad idea shows a complete lack of maturity in the field.
Re: Ooh the cloud goes down and everybody's fucked (Score:1)
The only data that is suitable for cloud:
1. The kind where it doesn't matter if someone else gets their hands on it.
2. The kind where it doesn't matter if you can't get your hands on it.
Re: (Score:2)
Or it shows someone who has actually looked at the value proposition and reliability of the cloud and found it wanting.
From my analysis, by the time you finish getting nickled and dimed to death it's a bit cheaper to maintain your own. Rental almost always costs more than owning unless the need is temporary.
That said, cloud as DR makes sense as long as you make sure you have your images tested and ready to go. DR is something you hope to never actually need, and when you do need it, it's a temporary situati
Re: Ooh the cloud goes down and everybody's fucked (Score:2)
So true. âoeThe Cloudâ is merely other peopleâ(TM)s servers. If your business relies on your site being available 24/7/365, build your own infrastructure that YOU own and control.
Also turn off the Fucking Automatic Updates. When you get yer shit dialed in, the last thing you need is someone fucking it all up by trying to, âoehelpâ.
Re: (Score:2, Insightful)
Everybody loves to manage their own email server.
Remember people chanting "lock her up!" over somebody having taken the important security step of running their own server?
Re: (Score:1)
If all they could scrape together was an incident when she was five years old and snitched a cookie from Grandma's cookie jar without asking, they would have still been yelling "lock her up!".
Re: Ooh the cloud goes down and everybody's fucked (Score:1)
It's just stupid Youtube and Snapchat though. Nothing important.
Nest is also down, considering people use it for security that's a bit more important.
Re: Ooh the cloud goes down and everybody's fucke (Score:1)
Not really. Teams of criminals are not waiting on standby to break into the houses of Nest subscribers if the cloud happens to evaporate for awhile. Maybe it can serve as a wakeup call to Nest customers and they will move to real and robust security. It could end up being good for more people than are harmed.
Re: (Score:2)
Rosco P. Coltrane predicted:
But just you wait till the data of the company you work for becomes inaccessible for any length of time, and then perhaps IT managers / bean counter / CEOs and other short-sighted "decision maker" will realize why the cloud is a Really Really Bad Idea [tm]
No. No, they won't.
The kind of psychopathic personality that dominates the executive management ranks of the corporate world is incapable of admitting mistakes. That's why stacked ranking is still a thing. And the ever-increasing disparity between executive compensation and that of line-level IT employees simply reinforces that "smartest guys in the room" mindset ...
Re: Ooh the cloud goes down and everybody's fucked (Score:1)
So trivial shared spreadsheets compiled by min-max gamers are also in jeopardy. Tell us it isn't so!
Looks like /. doesn't use google cloud (Score:2)
since it seems to still be up...
Re:Looks like /. doesn't use google cloud (Score:5, Funny)
Well, Slashdot runs on one Andover.net-owned Pentium2 server with a time-travel ISDN internet link to today's world. It's very robust.
Re: (Score:2)
Re: (Score:2)
Of course it supports unicode U+1F600
Have a back up for communications guys! (Score:3)
When Google goes down so you want to jump into company chat to see if anything is giving us issues but then Google Chat is also down. (We use G-Suites for docs/drive/chat/email etc). Good thing we're on AWS though. Mostly not effecting us.
Re: (Score:3)
Until AWS goes down...
Infrastructure for rent is always cheapest possible in its class. It typically ends up costing a lot more.
Re: (Score:2)
Oh trust me I always bring this up but my boss always says "If AWS is down our customers have bigger problems". I mean we use AWS DNS (Route53) and the HTTPS cert is generated by AWS and paired to the Load Balancer and can't even be downloaded by us for what AWS says is "security". Even if we had an offsite back up of the DB I'd need at least a few hours to spin up a box on another host and wait the nameservers on the domain to propagate. It's actually not a ton of work. About the same amount as on-boarding
Re:Have a back up for communications guys! (Score:5, Interesting)
Infrastructure for rent is always cheapest possible in its class. It typically ends up costing a lot more.
I worked at Amazon AWS and I can say that AWS takes resilience VERY seriously. A small company won't be able to match AWS's reliability. Simply because a fully stuffed 24/7 on-call rotation of sysadmins would cost you around a million USD per year.
AWS is basically becoming something akin to a utility. Sure, you can generate your own power for small-scale operations that don't care about reliability (a generator behind a food truck) or you can do a highly-reliable redundant power supply (emergency batteries and generators in a hospital). But that won't be cheaper than utility power.
Re:Have a back up for communications guys! (Score:5, Interesting)
On the one hand it's awesome.... I can stand up an RDS instance, get my domain on Route 53, Elasticache for memcached, and codedeploy to deploy my code paired with EC2 load balancers I could quite literally setup an auto scaling solution in like a few hours.... and initially be on the free tier so my first bill might end up being $0.
I'm just paranoid still. I don't like how the internet is becoming so centralized.
Re:Have a back up for communications guys! (Score:5, Funny)
But can you combine best of breed agile practices into a synergistic mashup to leverage maximum the ROI on your data assets?
Re:Have a back up for communications guys! (Score:5, Informative)
I get you're trying to be funny but none of what I said are "buzzwords". Just names of products. So here's a translation
RDS = Database hosting (with managed backups, cloning, and clusters) ...... yes deploy your code.
Route53 = DNS
Elasticache = Redis / Memcached hosting
Codedeploy
EC2 = Virtual Server Hosting (with managed backups, load balancing)
Re: (Score:2)
But he still gets his +5 funny :)
Re: (Score:2)
Most of the people on this site would do it for a pizza and a six-pack of Mt. Dew.
Re: Have a back up for communications guys! (Score:1)
Speak for yourself. Some of us are nerds, not IT cowboys.
Re: (Score:2)
The cultural reference was the soundtrack to the book Netslaves from the late 90s.
The track is Subversive Slogans. I'm not sure where a mirror is, if you don't have it in your nerd archive.
The quote is, "The difficult we do before lunch. The impossible will cost you a pizza and a six-pack of Mt. Dew."
Other quotes:
"It's not my life, just 80 hours out of my week."
"It's not just a job, it's how I feed my cat."
"Netslaves: without us, you'd still be sending away for porn."
Sadly, I can't even find it on TPB!
You c
Re: (Score:1)
I don't like how the internet is becoming so centralized.
I don't either, but I think we might as well piss into a firehose for all the difference we're going to make.
Re: (Score:2)
What a defeatist attitude. Remember, a bunch of goat-herders in a third-world shithole sent TWO superpowers packing with their tails between their legs with only small arms.
But remember, AWS has multiple locations - a catastrophic failure of one location shouldn't cause global outage. And Amazon has very strict deployment control policies - changes are very slowly rolled out over the multiple datacenters. So it's unlikely that a bad bug that can cause coordinated global outage will be easily deployed.
Oh, and nothing stops you from using AWS and Azure to avoid even that possibility.
Re: (Score:2)
But remember, AWS has multiple locations - a catastrophic failure of one location shouldn't cause global outage.
But it has in the past. In one case because of (humorously) a thunderstorm hitting one data center.
And Amazon has very strict deployment control policies - changes are very slowly rolled out over the multiple datacenters. So it's unlikely that a bad bug that can cause coordinated global outage will be easily deployed.
And yet, that has also happened before.
Re: (Score:2)
But it has in the past. In one case because of (humorously) a thunderstorm hitting one data center.
Yes, and I was on-call internally at AWS when this happened. It was fun! It was actually not a thunderstorm hitting a data center. A power line got shorted and a DRUPS fed back power into it, tripping all safeties and resulting in one AZ falling off the map.
However, the rest of the region (it has three AZs) still worked (mostly) fine. Other regions were not affected at all.
And yet, that has also happened before.
Not in the last 5 years. Amazon made a lot of effort to "regionalize" the infrastructure, so that each region is as independent as pos
Re: (Score:2)
Yes, and I was on-call internally at AWS when this happened. It was fun! It was actually not a thunderstorm hitting a data center. A power line got shorted and a DRUPS fed back power into it, tripping all safeties and resulting in one AZ falling off the map.
Not according to Wired [wired.com], Datacenterknowledge [datacenterknowledge.com], or The Register [theregister.co.uk]. A number of customers ended up hard down. They were hard down everywhere.
That other customers who happened to be in other datacenters might still be up was no consolation to the ones that were down.
Not in the last 5 years.
It happened two years [theverge.com] ago.
Re: (Score:2)
They were hard down everywhere.
Ah, that's another event from 2012. This was looooong ago, on a previous generation of AWS architecture. Back then the regions were not yet decoupled.
It happened two years [theverge.com] ago.
Nope. Not global. Only us-east-1 was affected, all other regions were doing fine. I know, I was there.
Re: (Score:2)
I'm sure that was a huge relief to the people who were hard down and really didn't know if their data was coming back or not.
Those that were down were down globally.
I'm not claiming that AWS has done a poor job by any means, everyone has unplanned downtime. I'm just saying that they haven't done better than a small operation can manage. To do that, they'll need cross region replication.
Re: (Score:2)
Those that were down were down globally
Customers who had infrastructure in more than one region did just fine. And it's not like it's complicated to do in AWS.
I'm just saying that they haven't done better than a small operation can manage. To do that, they'll need cross region replication.
AWS has it, for DynamoDB and RDS. You just need to use it.
Re: (Score:2)
It needs to be for the whole thing, the VMs, s3, databases, network routing, etc. THEN it will provide a level of reliability that a small company could not otherwise provide for itself at that price.
Re: Have a back up for communications guys! (Score:1)
You make that case tomorrow morning, as one of the entities wounded by todays outage. Not from some abstract FUD point of view.
Re: (Score:2)
Horseshit. The secret to having stable infrastructure is to keep it simple and not fuck with it all the time. Any cloud provider "improves" their backend pretty much non-stop, which involves a lot of fucking with some very complex infrastructure. Install a MySQL server locally, set up backups (to cloud, if need be), and you can forget about it for years. If it's not exposed to the public it doesn't really even need updates. But there's no such option in the cloud. Even if you install your stuff in your own
Re: (Score:3)
Any cloud provider "improves" their backend pretty much non-stop, which involves a lot of fucking with some very complex infrastructure. Install a MySQL server locally, set up backups (to cloud, if need be), and you can forget about it for years.
You plainly have not been responsible for anything mission-critical. How soon can you replace the hardware if it fails? Do you have an inventory of spare parts in the DC? How soon an admin can get there?
For perspective, Google's outage lasted less than 2 hours and was not universal.
Re: (Score:2)
Those 2 hours are until the Google cloud is up again. Does not at all imply customer systems in there are functional again.
Re: (Score:2)
Re: (Score:2)
Of course. I am merely pointing out that the 2h figure is bogus.
Re: (Score:2)
I fully agree. But KISS has fallen out of fashion. No, there are no better alternatives, just more profitable ones.
Re: (Score:2)
Yeah, like that bank that told me they take my security VERY seriously and it took less than 5 minutes to hack their online banking app. No, it was not a small bank. You cannot make that claim without lying or being blind. Your business is selling stuff, not keeping customer stuff running. Sure, it is nice for you if you can do that too, because it makes the selling easier, but in the end, you could not care less. If, on the other hand, the admin staff is actually on the payroll of the customer, that adds a
Re: (Score:2)
Re: (Score:2)
And both provide services of limited quality in just the same fashion, because the economics of the cloud are not nearly as good as is usually claimed. They all understand that they need to keep doing it in this way to continue to have nice profits.
It it is not a cartel if they coordinate on that without ever talking to each other about it.
Re: (Score:2)
Re: (Score:2)
For a small company, a full staff is one or two people. If you don't abuse the on-call thing, that may not cost you extra. You need those people anyway because the CEO sure as hall can't decipher the interface to the cloud servers.
AWS has potential to save money, but depending on the load, it also has the potential to be a worse deal than "rent to own" home furnishings.
Re: (Score:2)
Re: (Score:2)
You WILL care about paying more than it would cost to own your own server.
Re: (Score:2)
Re: (Score:2)
It looks like at the very low end now it might make sense for some people. But as your requirements go up, it starts making less sense, particularly when you remember that the office will need internet connectivity anyway.
What does make sense at that point is maintaining an account so that your lead tech can bring things up on Amazon as a DR measure.
Re: (Score:2)
If anything, large-scale computing benefits from AWS because you don't have to buy your own hardware that might be useful only for a fraction of time. So even large-scale scientific experiments often simply use AWS (or Azure)
Re: (Score:2)
If you're a company with one or two IT people then you also won't care about a one or two hour outage once every couple of years.
That depends very much on the kind of business you do, not on the size of the company.
Re: (Score:2)
Re: Have a back up for communications guys! (Score:1)
My company opens at 8 and locks the door at 5. Our server runs Windows 2000. It works. I have no control over it, it just seems to work. There are thousands of other tech companies like the one I work at out there. The world does not revolve around bleeding edge IT. Get a clue.
Re: (Score:2)
Until AWS goes down...
Infrastructure for rent is always cheapest possible in its class. It typically ends up costing a lot more.
Of course it is. Outsourcing to specialists in the area rather than attempting to roll your own is always the cheapest in class.
The thing is I'm not sure you know what "cheapest in class" actually means since you're using it in a negative context. You point to AWS being cheapest in class, but you ignore the fact that people *not* using it are in a significantly different, and lower class of service.
The cloud is someone else's computer. The thing people forget, is that other person is better at looking after
Re: (Score:2)
Your claim that people not using the cloud are automatically in a lower class is either clueless or a direct lie. Sure, they can be lower, but they can also be significantly higher. They may also care about a better customized solution, more redundancy, real network isolation, their own sysadmins that can not only keep stuff running reliably but also competently evaluate what is on offer, etc.
The cloud is a generic ElCheapo solution. It is suitable if you do not need much control and the mediocre availabili
Re: (Score:2)
I simply... googled it... and came up with:
https://www.google.com/appssta... [google.com]
Re: (Score:2)
When Google goes down so you want to jump into company chat to see if anything is giving us issues but then Google Chat is also down. (We use G-Suites for docs/drive/chat/email etc). Good thing we're on AWS though. Mostly not effecting us.
FWIW, Google's SREs (sysadmins, sort of) use IRC. Most internal Google communications are on GMail and Google Chat, obviously, but the SRE teams use IRC for a significant part of their comms, specifically so that in the event of a Chat outage, they can still communicate.
Re: (Score:2)
Re: (Score:2)
When Google goes down so you want to jump into company chat to see if anything is giving us issues but then Google Chat is also down. (We use G-Suites for docs/drive/chat/email etc). Good thing we're on AWS though. Mostly not effecting us.
No thanks. I'll use the opportunity to get some work done instead :-)
Hot Vapor! (Score:2)
Probably not where you want your critical servers are located...
Re: (Score:2)
Clouds are condensed, liquid water. They aren't vapor.
Re: (Score:2)
I disagree. In an atmospheric cloud context, I would say that rain (condensation) is the condensed form of liquid water.
Those poor Nest subscribers... (Score:5, Funny)
Without being able to access the controls for their A/C, they may have to open* their windows!
*Unless they are Smart Windows and also affected, and assuming they can enter or exit their house without being able to manipulate their Smart Doorlocks.
captcha: functor
Re: (Score:2)
Without being able to access the controls for their A/C, they may have to open* their windows!
Or else, you know, walk over to the thermostat and adjust the settings there.
Re: Those poor Nest subscribers... (Score:1)
Why does it seem like this is the time to reread ãS"Repent, Harlequin!" Said the Ticktockmanã [wikipedia.org]
Nest is down in the Bay area (Score:3, Informative)
Re: (Score:2)
Who cares about Florida?
But their ad servers were still running (Score:2)
App arrogance - "You are not connected..." (Score:2)
Yes, yes I am. You, on the other hand, are unable to reach your external server address and are blaming the entire internet connection rather than do an “Unable to connect to Nest server”-type error. The mere concept that Nest infrastructure itself could be the problem never enters this app's consciousness.
The Nest device is fin
Re: (Score:2)
That isn't arrogance, that's insecurity. They planned ahead to suck, and to try to deflect the blame. LOL
GMail not working on the West Coast. (Score:1)
Jay Sherman (Score:2)
And nothing of value was lost.
It's times like these... (Score:1)
that I'm so glad to have a hotmail account.
We don't have these issues... Do we?
Re: It's times like these... (Score:1)
Pay for a fastmail account, or any other robust non-free provider.
Jeez (Score:2)
I felt a great disturbance in the Internet, as if millions of millenials suddenly cried out in terror and were suddenly silenced.
How about UTC? (Score:1)
I live in Australia and I always know what my offset is from UTC. I assume most people around the world know theirs too.
What I never know is my offset from PT, CT or ET. Does anyone else have that problem too?
I guess if this problem was isolated to the U.S.A. then only using local time definitions is OK.
Re: (Score:2)
Oh how ironic... (Score:2)