Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
The Internet IT

AWS Outage Takes Thousands of Websites Offline for Three Hours (cnbc.com) 56

AWS experienced a three-hour outage early Monday morning that disrupted thousands of websites and applications across the globe. The cloud computing provider reported DNS problems with DynamoDB in its US-EAST-1 region in northern Virginia starting at 12:11 a.m. Pacific time. Over 4 million users reported issues, according to Downdetector. Snapchat saw reports spike from more than 22,000 to around 4,000 as systems recovered. Roblox dropped from over 12,600 complaints to fewer than 500. Reddit and the financial platform Chime remained affected longer. Perplexity, Coinbase and Robinhood attributed their platform disruptions directly to AWS.

Gaming platforms including Fortnite, Clash Royale and Clash of Clans went offline. Signal confirmed the messaging app was down. In Britain, Lloyd Bank, Bank of Scotland, Vodafone, BT, and the HMRC website faced problems. United Airlines reported disrupted access to its app and website overnight. Some internal systems were temporarily affected. Delta experienced a small number of minor flight delays. By 3:35 a.m. Pacific time, AWS said the issue had been fully mitigated. Most service operations were succeeding normally though some requests faced throttling during final resolution. AWS holds roughly one-third of the cloud infrastructure market ahead of Microsoft and Google.
This discussion has been archived. No new comments can be posted.

AWS Outage Takes Thousands of Websites Offline for Three Hours

Comments Filter:
  • AWS Outage (Score:5, Informative)

    by TwistedGreen ( 80055 ) on Monday October 20, 2025 @09:06AM (#65738050)

    Oh, I know. Believe me.

    Only select services were--and continue to be--affected, however. The root cause seems to be related to a failure in DNS resolution for DynamoDB affecting the entire us-east-1 region. This caused all kinds of AWS internal APIs to fail which relied on this service.

    • ... will still believe that Cloud is somehow magical and immune from outages despite plenty of examples to the contrary, particularly from Azure in the last few years.

      • by hjf ( 703092 )

        if anything, it's more vulnerable than on-prem because on-prem very often uses "bad practices" that end up saving you.

        For example, it's good practice to have DNS names for everything. Lazy sysadmins will just hardcode IPs instead

        But hey, you're immune to DNS failing you if you do this...

        • Re: (Score:2, Interesting)

          by Anonymous Coward

          It's 2025, I'm still deploying bare-metal standalone hosts for PAM systems for really, really critical infrastructure and still hardcode hosts files routinely to prevent dependencies breaking on DNS. All the customers who migrated to EC2 or SaaS (us-east-1 mind you) have had a bad day today.

  • by rsilvergun ( 571051 ) on Monday October 20, 2025 @09:10AM (#65738058)
    Slashdot is hosted on two squirrels and a dead badger running Linux.
  • by andyring ( 100627 ) on Monday October 20, 2025 @09:11AM (#65738062) Homepage

    It's not DNS!

    It's not DNS!

    I promise it's not DNS!

    Dammit, it's DNS...

  • Roblox? (Score:2, Troll)

    by evil_aaronm ( 671521 )
    At midnight? Isn't Roblox a kids' game? What are kids - thousands, according to the fine summary - doing awake at midnight on a school night?
    • Re:Roblox? (Score:4, Informative)

      by PDXNerd ( 654900 ) on Monday October 20, 2025 @09:34AM (#65738140)

      In France, not only was it day time, it's a holiday week for much of the country's school kids.

    • by EvilSS ( 557649 )
      Are you somehow under the impression that all of the children in the world live in a single timezone?
    • This is the Internet, not America.

    • Time zones are a thing, and have been since the railroads were built in the 19th century.

  • There's a funny effect that happens with mass centralization of services like this - there's a "safety in numbers" instinct that causes mass failures.

    When you self-host, you are obviously responsible for managing outages. Part of what you pay for with AWS is outsourcing that blame.

    AWS outages are treated like the weather; nobody's fault, It Just Happens. If your DC is down, you must be incompetent.

    • pretty much (Score:2, Insightful)

      by Anonymous Coward

      It used to be "Nobody ever got fired for buying IBM." I've seen this happen at work, some department head finds a vendor who will solve all the company's problems. Everyone in procurement, legal and risk management pushes back saying, there's yellow flags all over the place. Why don't we stick with who we have or go with the industry leader. Department head says, industry leader is getting worse over time and has these specific problems. Ok, fine, it's your decision. We sign up and the predicted calamity ha

  • by beep999 ( 229889 ) on Monday October 20, 2025 @09:44AM (#65738164) Homepage
    I'm trying to work and it seems like Atlassian is still having problems recovering. "Put it in the cloud!", they said. "It'll be more reliable!", they said.
    • It's true, we never had outages before the cloud. Yet another CSP innovation.

      • by Anonymous Coward

        We didn't have such widespread outages due to a single provider when sites were thoroughly distributed across gazillions of providers.

        • No, instead you had much more frequent and distributed outages across every site in existence because there wasn't an easy and well-understood paved road to scalable web infrastructure. Thus the recession and almost-extinction of the "Slashdot Effect" of websites being DDoS'd into oblivion by legitimate traffic.

    • Searches on amazon.com aren't working for me right now.
    • There were plenty of outages before the cloud, they were just isolated to specific companies.

      In those days, if we wanted to upgrade a server, we just started the upgrade. People didn't *expect* 100% uptime like they do today.

    • Self-hosted Atlassian products seem to be just fine.

      As for what "they said", they also said this cloud shit would be cheaper. It isn't.

  • I noticed DNS slowness all Sunday. Not enough to break elinks but uncharacteristic. My primary nameserver is in CO, nowhere near MAE-east.

  • Downdetector says AWS is just plain lying.
  • by leonbev ( 111395 ) on Monday October 20, 2025 @10:46AM (#65738320) Journal

    So I don't have to worry about getting any help desk tickets for these issues.

    They also say that they are "moving mountains" to resolve my issue with Confluence being down. Funny... I don't remember AWS having issues with a mountain landing on their us-east-1 data centers. Just fess up and tell us what the real problem is? It's not like we don't already know.

  • by PPH ( 736903 ) on Monday October 20, 2025 @11:00AM (#65738366)

    ... the weekend cleaning crew unplugged their floor buffer, plugged the DNS server back in. And all was right in the world again.

    • Re:And then ... (Score:4, Insightful)

      by dsgrntlxmply ( 610492 ) on Monday October 20, 2025 @01:14PM (#65738740)
      This happened to a Sun 3/280 years ago. I was going on vacation. Upon arrival at my motel room, the room phone message light ominously was on. It was work, down because the engineering server was down. The cleaning crew had a little kid with them, and the server room had trash cans to be emptied. The kid did what comes naturally: flip and push all of the brightly lit switches. Notably the one to the disk drive. I talked someone through fsck for about an hour.
  • From their status page...

    > Oct 20 1:26 AM PDT We can confirm significant error rates for requests made to the DynamoDB endpoint in the US-EAST-1 Region. This issue also affects other AWS Services in the US-EAST-1 Region as well. During this time, customers may be unable to create or update Support Cases. [...]

    > Oct 20 12:51 AM PDT We can confirm increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region. This issue may also be affecting Case Creation through the AWS Support

  • by gestalt_n_pepper ( 991155 ) on Monday October 20, 2025 @11:28AM (#65738472)

    Which is what happens when you let MBAs make decisions instead of engineers.

    MBAs hear the magic words "CAPEX vs OPEX" and their tiny myopic brains turn off as do the compliance officers with wet dreams of shifting liability elsewhere.

    lt all works (with ever rising costs), until it doesn't.

    • But what is the tax? These services are down and no one is blaming the companies, they just say it's an Amazon problem. No one is penalizing a company because their Internet services went down just like most of their competitors

      • But what is the tax?

        Decrease in traffic, which reduces revenues. Was that supposed to be a trick question?

        • by Junta ( 36770 )

          Question to what extent was revenue reduced versus deferred. If 90% of their customers couldn't reach competitors either, was revenue lost or did it just happen later?

          The thing is that this is terrible for all the outages to be aligned for the internet users, but for the providers, the thought that outages are likely to align with competitor outages might be a pretty solid mitigation, so long as the outage doesn't exceed what they might incur themselves. Even a longer outage common with competitors may be

      • The tax, aside from lost revenue, are inevitable security breaches and uncontrollable outages. If a state actor decides to take down AWS or Azure, literally thousands of businesses would be down for an indefinite period.

        This incident was nothing, a glitch. That doesn't mean the next one will be.

  • by Mirnotoriety ( 10462951 ) on Monday October 20, 2025 @02:14PM (#65738894)
    For want of a DNS record the domain was lost;
    For want of the domain the hosted zone was lost;
    For want of the hosted zone the VPC routing was lost;
    For want of the VPC routing the outbound endpoint was lost;
    For want of the outbound endpoint the DNS resolution was lost;
    For want of the DNS resolution the application was lost;
    And all for the want of a critical DNS record.
  • You put all of your eggs in a cloud basket, and that is what you get. And just think of what would happen during a shooting war.
  • Base...Base.

    Base.

    All Your Base...Are Belong to Us.

Between infinite and short there is a big difference. -- G.H. Gonnet

Working...