AWS Outage Takes Thousands of Websites Offline for Three Hours (cnbc.com) 56
AWS experienced a three-hour outage early Monday morning that disrupted thousands of websites and applications across the globe. The cloud computing provider reported DNS problems with DynamoDB in its US-EAST-1 region in northern Virginia starting at 12:11 a.m. Pacific time. Over 4 million users reported issues, according to Downdetector. Snapchat saw reports spike from more than 22,000 to around 4,000 as systems recovered. Roblox dropped from over 12,600 complaints to fewer than 500. Reddit and the financial platform Chime remained affected longer. Perplexity, Coinbase and Robinhood attributed their platform disruptions directly to AWS.
Gaming platforms including Fortnite, Clash Royale and Clash of Clans went offline. Signal confirmed the messaging app was down. In Britain, Lloyd Bank, Bank of Scotland, Vodafone, BT, and the HMRC website faced problems. United Airlines reported disrupted access to its app and website overnight. Some internal systems were temporarily affected. Delta experienced a small number of minor flight delays. By 3:35 a.m. Pacific time, AWS said the issue had been fully mitigated. Most service operations were succeeding normally though some requests faced throttling during final resolution. AWS holds roughly one-third of the cloud infrastructure market ahead of Microsoft and Google.
Gaming platforms including Fortnite, Clash Royale and Clash of Clans went offline. Signal confirmed the messaging app was down. In Britain, Lloyd Bank, Bank of Scotland, Vodafone, BT, and the HMRC website faced problems. United Airlines reported disrupted access to its app and website overnight. Some internal systems were temporarily affected. Delta experienced a small number of minor flight delays. By 3:35 a.m. Pacific time, AWS said the issue had been fully mitigated. Most service operations were succeeding normally though some requests faced throttling during final resolution. AWS holds roughly one-third of the cloud infrastructure market ahead of Microsoft and Google.
AWS Outage (Score:5, Informative)
Oh, I know. Believe me.
Only select services were--and continue to be--affected, however. The root cause seems to be related to a failure in DNS resolution for DynamoDB affecting the entire us-east-1 region. This caused all kinds of AWS internal APIs to fail which relied on this service.
Yet pointy hairs... (Score:3)
... will still believe that Cloud is somehow magical and immune from outages despite plenty of examples to the contrary, particularly from Azure in the last few years.
Re: (Score:3)
if anything, it's more vulnerable than on-prem because on-prem very often uses "bad practices" that end up saving you.
For example, it's good practice to have DNS names for everything. Lazy sysadmins will just hardcode IPs instead
But hey, you're immune to DNS failing you if you do this...
Re: (Score:2, Interesting)
It's 2025, I'm still deploying bare-metal standalone hosts for PAM systems for really, really critical infrastructure and still hardcode hosts files routinely to prevent dependencies breaking on DNS. All the customers who migrated to EC2 or SaaS (us-east-1 mind you) have had a bad day today.
Luckily for us (Score:5, Funny)
Re: (Score:2)
Slashdot is hosted on two squirrels and a dead badger running Linux.
Wow, +1 for the obscure Lucy Snyder reference!!
http://strangehorizons.com/wor... [strangehorizons.com]
It's always DNS... (Score:5, Funny)
It's not DNS!
It's not DNS!
I promise it's not DNS!
Dammit, it's DNS...
Re: (Score:2)
Re: (Score:2)
hey the missing DNS is just that the healtcheck for dynamodb failed, with no valid endpoints, you have no dns
Re:It's always DNS... (Score:4, Funny)
My favorite haiku:
It's not DNS...
There's no way it can be DNS...
It was DNS.
Re: (Score:2)
My favorite haiku:
It's not DNS...
There's no way it can be DNS...
It was DNS.
Not quite, as you have 9 in the middle.
It's not DNS
There's no way it's DNS
It was DNS
Roblox? (Score:2, Troll)
Re:Roblox? (Score:4, Informative)
In France, not only was it day time, it's a holiday week for much of the country's school kids.
Re: (Score:2)
Re: (Score:2)
Roblox is great for teaching programming (Score:4, Insightful)
Roblox is a game for adults to go and try to meet kids.
It should be investigated and closed. No parents with any sense should let their kids play it.
Roblox is a giant video game platform that provide benefits that greatly outweigh the risks. The fact that you see a giant platform of kids having fun and think "pedophilia" says a lot more about you than the world. If you have such urges, I'd recommend you see a professional and get some help. Way to say you're an incel without directly saying it. First of all, every place children have ever been can be abused by motivated enough bad actors.
If you ever convince someone to reproduce with you, you'll understand children need all the activities they can get. You can't eliminate a great, positive platform just because of a theoretical concern. I have literally spend hundreds of hours in Roblox, not by choice, but because my kids love it when I join in and help. I've never seen anything resembling pedophilia. No one contacts me or asks for info. My kids have never reported that happening to them. It must happen, but it happens at the park, the grocery store, etc. No actual parent is in a place of such control they can eliminate every suboptimal option to get through the day and encourage their kids to learn or at the very least have fun.
As much as I hate the games on Roblox, I must concede, it's a wonderful platform my kids use to hang out with their friends, including a few that moved away. It combines social media, gaming, and one of the very best programming environments I've ever seen. It's encouraged both of my children to begin programming and designing their own games....but hey, you see all that and think...ooh, is there kid touching involved??
Re: Roblox is great for teaching programming (Score:2)
You clearly haven't been on there this year. (Score:2, Interesting)
You're an awful parent. Your kids can learn programming safely without being exposed to the groomers on Roblox.
So you see children having fun playing lame video games with their friends and your mind thinks "groomer!" "pedophilia!!!" Seriously...you think you're insulting me by calling you an "awful parent," but you completely lack any credibility. There's clearly something wrong with you if your mind goes to that. However, you clearly have never been on the platform.
The most common complaint is not criminals, but the excessive AI moderation. My kids have gotten kicked off a few times for routine swearing...
Re: Roblox? (Score:3)
This is the Internet, not America.
Re: (Score:2)
Time zones are a thing, and have been since the railroads were built in the 19th century.
Pathology in numbers (Score:2)
When you self-host, you are obviously responsible for managing outages. Part of what you pay for with AWS is outsourcing that blame.
AWS outages are treated like the weather; nobody's fault, It Just Happens. If your DC is down, you must be incompetent.
pretty much (Score:2, Insightful)
It used to be "Nobody ever got fired for buying IBM." I've seen this happen at work, some department head finds a vendor who will solve all the company's problems. Everyone in procurement, legal and risk management pushes back saying, there's yellow flags all over the place. Why don't we stick with who we have or go with the industry leader. Department head says, industry leader is getting worse over time and has these specific problems. Ok, fine, it's your decision. We sign up and the predicted calamity ha
Some things still broken... (Score:3, Interesting)
Re: Some things still broken... (Score:3)
It's true, we never had outages before the cloud. Yet another CSP innovation.
Re: (Score:1)
We didn't have such widespread outages due to a single provider when sites were thoroughly distributed across gazillions of providers.
Re: (Score:2)
No, instead you had much more frequent and distributed outages across every site in existence because there wasn't an easy and well-understood paved road to scalable web infrastructure. Thus the recession and almost-extinction of the "Slashdot Effect" of websites being DDoS'd into oblivion by legitimate traffic.
Re: (Score:2)
Re: (Score:3)
There were plenty of outages before the cloud, they were just isolated to specific companies.
In those days, if we wanted to upgrade a server, we just started the upgrade. People didn't *expect* 100% uptime like they do today.
Re: (Score:2)
Re: (Score:3)
Self-hosted Atlassian products seem to be just fine.
As for what "they said", they also said this cloud shit would be cheaper. It isn't.
L O N G E R !! (Score:2)
I noticed DNS slowness all Sunday. Not enough to break elinks but uncharacteristic. My primary nameserver is in CO, nowhere near MAE-east.
Back to slow (Score:2)
I spoke too soon. DNS was fast during the NAm day, but just got worse (timeouts). .au woke up?
AWS said the issue had been fully mitigated (Score:2)
Re: (Score:3)
AWS isn't saying the issue is fully mitigated, at least not on this status page as of 11:15 am CT: https://health.aws.amazon.com/... [amazon.com]
Cloud hosted JIRA is down as well... (Score:3)
So I don't have to worry about getting any help desk tickets for these issues.
They also say that they are "moving mountains" to resolve my issue with Confluence being down. Funny... I don't remember AWS having issues with a mountain landing on their us-east-1 data centers. Just fess up and tell us what the real problem is? It's not like we don't already know.
And then ... (Score:5, Funny)
Re:And then ... (Score:4, Insightful)
Early impact: AWS's own ticket system (Score:2)
From their status page...
> Oct 20 1:26 AM PDT We can confirm significant error rates for requests made to the DynamoDB endpoint in the US-EAST-1 Region. This issue also affects other AWS Services in the US-EAST-1 Region as well. During this time, customers may be unable to create or update Support Cases. [...]
> Oct 20 12:51 AM PDT We can confirm increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region. This issue may also be affecting Case Creation through the AWS Support
Moved to cloud? Now pay the stupidity tax. (Score:3)
Which is what happens when you let MBAs make decisions instead of engineers.
MBAs hear the magic words "CAPEX vs OPEX" and their tiny myopic brains turn off as do the compliance officers with wet dreams of shifting liability elsewhere.
lt all works (with ever rising costs), until it doesn't.
Re: Moved to cloud? Now pay the stupidity tax. (Score:2)
But what is the tax? These services are down and no one is blaming the companies, they just say it's an Amazon problem. No one is penalizing a company because their Internet services went down just like most of their competitors
Re: (Score:2)
But what is the tax?
Decrease in traffic, which reduces revenues. Was that supposed to be a trick question?
Re: (Score:2)
Question to what extent was revenue reduced versus deferred. If 90% of their customers couldn't reach competitors either, was revenue lost or did it just happen later?
The thing is that this is terrible for all the outages to be aligned for the internet users, but for the providers, the thought that outages are likely to align with competitor outages might be a pretty solid mitigation, so long as the outage doesn't exceed what they might incur themselves. Even a longer outage common with competitors may be
Re: (Score:3)
The tax, aside from lost revenue, are inevitable security breaches and uncontrollable outages. If a state actor decides to take down AWS or Azure, literally thousands of businesses would be down for an indefinite period.
This incident was nothing, a glitch. That doesn't mean the next one will be.
Total system crash caused by AWS:DNS crash (Score:5, Funny)
For want of the domain the hosted zone was lost;
For want of the hosted zone the VPC routing was lost;
For want of the VPC routing the outbound endpoint was lost;
For want of the outbound endpoint the DNS resolution was lost;
For want of the DNS resolution the application was lost;
And all for the want of a critical DNS record.
Consequences of Embracing Single Point of Failure (Score:2)
Base...All Your Base...Are Belong to Us (Score:2)
Base...Base.
Base.
All Your Base...Are Belong to Us.