Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
The Internet Cloud

AWS Outage Has Taken Down a Big Chunk of the Internet (theverge.com) 90

Amazon Web Services (AWS), Amazon's internet infrastructure service that is the backbone of many websites and apps, is experiencing a major outage affecting a large portion of the internet. From a report: "Kinesis has been experiencing increased error rates this morning in our US-East-1 Region that's impacted some other AWS services," Amazon said in a statement to The Verge. "We are working toward resolution." And, ironically, in a notice on the AWS Service Health Dashboard, Amazon said the issue has apparently "affected our ability to post updates" to that dashboard. "We continue to work towards recovery of the issue affecting the Kinesis Data Streams API in the US-EAST-1 Region," Amazon said in a 1:47PM ET update posted to the dashboard. "For Kinesis Data Streams, the issue is affecting the subsystem that is responsible for handling incoming requests. The team has identified the root cause and is working on resolving the issue affecting this subsystem."
This discussion has been archived. No new comments can be posted.

AWS Outage Has Taken Down a Big Chunk of the Internet

Comments Filter:
  • Well now what? (Score:5, Insightful)

    by AndyKron ( 937105 ) on Wednesday November 25, 2020 @02:29PM (#60765818)
    In future news: Microsoft Windows 10 has stopped working for millions of people because they couldn't access the subscription server. SolidWorks does that to me all the time the bastards.
    • Re:Well now what? (Score:5, Insightful)

      by sabri ( 584428 ) on Wednesday November 25, 2020 @02:59PM (#60765884)

      In future news: Microsoft Windows 10 has stopped working for millions of people because they couldn't access the subscription server.

      That would be Azure, not AWS.

      Nevertheless. This will be a wake-up call for all those hipster executives that think "the cloud" is cheaper than owning datacenters.

      It might be, until they find out that they don't get to make decisions on what gets fixed first, since they don't own the infrastructure or manage the engineers fixing the network.

      • Re:Well now what? (Score:4, Insightful)

        by Monoman ( 8745 ) on Wednesday November 25, 2020 @03:27PM (#60765964) Homepage

        Risk not cost.

        Pick any two: cheap, fast, correct.

      • This will be a wake-up call for all those hipster executives that think "the cloud" is cheaper than owning datacenters.

        "The Cloud" is not only cheaper, but the total downtime is going to be a lot less than a bespoke datacenter run by people they hired off of Craigslist.

        Before you say "hire better people", understand that a typical CEO doesn't have the expertise to judge who is "better" at IT. Amazon does have that expertise.

        For most companies "owning datacenters" is neither cheap nor reliable.

        • by guruevi ( 827432 )

          Not everyone needs their own datacenter but hosting companies and virtual private servers existed long before AWS.

          The cloud is not cheaper, it's a LOT more expensive when you put the entire thing together, but it's hidden in a variable monthly charge so it's hard to compare. I find the cloud to be useful for burst capacity usage and flexing when you don't have a budget for a capital expenditure.

          But you still need the same amount of IT staff, if not more, since your entire infrastructure still exists and ne

        • by sjames ( 1099 )

          The servers I run have far less downtime than the cloud.

        • Re: (Score:3, Insightful)

          by Shades72 ( 6355170 )

          Cloud is neither better or cheaper. It is at first glance more convenient. And that is what "engineers" nowadays value most.

          Give it a bit of time and you'll get IT specialists that can only do one thing. By now, you (as a company) need to hire much more services externally, as you can't rely anymore on any in-house skill-set. 3rd party service providers will have you (as a company) over a barrel. And you'll pay through the nose for the privilege.

          By now you (as a company) are so dependent on 3rd party servic

      • by msauve ( 701917 )
        "..."they don't own the infrastructure or manage the engineers fixing the network."

        As if a PHB yelling at people ever got stuff fixed faster.
      • by stwrtpj ( 518864 )

        It might be, until they find out that they don't get to make decisions on what gets fixed first, since they don't own the infrastructure or manage the engineers fixing the network.

        They don't have to make those decisions, because outages at AWS are indiscriminate; the same outage can affect both a Fortune 500 company and a 10-person startup. Because AWS has so many customers across a broad range of pricing tiers, it behooves them to fix problems right away. They don't have the luxury of saying "this is only affecting our bottom tier of customers so we can goof off all we want."

      • Re:Well now what? (Score:4, Interesting)

        by MachineShedFred ( 621896 ) on Wednesday November 25, 2020 @08:25PM (#60766626) Journal

        Or you could implement "cloud" in a way that the cloud providers themselves recommend: multi-region hosting.

        All of today's problems are in us-east-1 (Virginia) region. If you are in us-west-2 or us-east-2 you are largely unaffected.

        Combine the following:
        1. cross-region database backups
        2. cross-region S3 bucket replication
        3. privileged info (passwords, API keys, etc.) in a secrets management system (see: Vault) with it's database backed up or mirrored cross-region
        4. infrastructure as code, checked into git, mirrored cross-region.

        Now let's say that us-east-1 gets hit by a meteor. Your data is already sitting across the continent in Oregon either up and running in the case of the S3 buckets, or restorable from your nightly database snapshots, and you can run 'terraform apply' during that database restore pointing at us-west-2, and all your stuff comes back up - just change your DNS to point at the new load balancer if your Terraform doesn't do that for you.

        Major outage just turned into a disaster recovery exercise and you're back up in an hour or two. And you look like a god damn genius while the rest of the Internet whines about not cloud providers when it's really their own fault for not planning to have a truly resilient architecture for literally no more money.

        • i was agreeing with everything you said until you mentioned "truly resilient architecture for literally no more money"

          this is categorically false. if you have your resources in multiple az's/regions, you are going to pay for the additional resources. and this is not even involving peering costs if needed.

          yes you can do it. but it takes a lot of work and it's expensive

          • Except you missed the point of my post - you don't have stuff up in those other regions other than database snapshots on either RDS or S3 until an event like this happens. Then you apply your Terraform with a different AWS provider specifying the different region.

            I suppose technically you would spend a few dollars on the insanely cheap storage that is S3, but the difference is so negligible as to not matter. Obviously, the more of your backup environment you leave on hot standby, the more it will cost - b

      • by khchung ( 462899 )

        Nevertheless. This will be a wake-up call for all those hipster executives that think "the cloud" is cheaper than owning datacenters.

        Depends. If the same problem hit ones own datacenter, how many companies' support team can fix the problem faster than AWS?

        I would bet most would do worse.

      • by naubol ( 566278 )

        Nevertheless. This will be a wake-up call for all those hipster executives that think "the cloud" is cheaper than owning datacenters. It might be, until they find out that they don't get to make decisions on what gets fixed first, since they don't own the infrastructure or manage the engineers fixing the network.

        Sure, they can decide what gets fixed first much more often if they own the data center.

      • How often is the cloud down, and for how long? Now compare that to your typical self-host in a data center. If the uptime is better, does it matter where the servers are located?

      • by Hentes ( 2461350 )

        Yeah, but they can blame any failure on someone else, which is the main selling point of the cloud.

      • Nevertheless. This will be a wake-up call for all those hipster executives that think "the cloud" is cheaper than owning datacenters.

        Why would it be a wakeup call? You are assuming that this cloud outage is more severe than that caused by the infrastructure managed by said executives.

        People love pointing to Cloud outages while continuously ignoring that the uptime despite those outages is typically far better than the uptime of company's own IT systems.

    • SolidWorks does that to me all the time the bastards.

      An obvious solution is to stop getting your software from bastards.

      I don't have that problem with FreeCAD. No cost = No subscription server.

      Plus, it uses Python as a scripting language, rather than VBA for Solidworks.

      • by sjames ( 1099 )

        Plus, you don't have to worry about losing the ability to use your own hard work because it's locked up in a proprietary format and they decided to alter the deal further.

      • by caseih ( 160668 )

        FreeCAD is no substitute for SolidWorks. Sorry. Not even close. I use FreeCAD a fair bit and find it to be fairly powerful for my simple needs, if extremely quirky and buggy. But I have no illusions that it can do more than a fraction of what people do with SolidWorks.

        • For power-users, what you say is likely true.

          But many people who pay for Solidworks, perhaps the majority, actually have simple needs that can be met by FreeCAD.

          FreeCAD works for me. Since I am a software nerd, I do much of my design work by writing scripts that are then executed to generate the design. If I need to change the design, I can often do so by changing one constant and re-running the script. A non-nerd may spend hours making the same change by hand.

          For script-driven development, FreeCAD is sup

    • This is a very nice one and gives in-depth information. I am really happy with the quality and presentation of the article. I’d really like to appreciate the efforts you get with writing this post. Thanks for sharing. https://www.sevenmentor.com/am... [sevenmentor.com]
  • I do not think that OCI has dependencies on AWS... does it?

  • Sounds like this is a little overblown to me. Also looks like my favorite sites aren't stupid enough to use AWS :)

    • by brunes69 ( 86786 )

      Well architected sites won't be affected by failures in a single availability zone.

      • by EvilSS ( 557649 )

        Well architected sites won't be affected by failures in a single availability zone.

        Very true. Although in this instance, it looks like some AWS services like Kinesis are impacted more broadly. Heck, Amazon is even having trouble updating the outage page because they use Kinesis to push the status updates to it. So you can do everything right but if the cloud provider PaaS services go bits up due to a site outage, you are still hosed. Seems like Amazon needs to look at their own services architecture.

      • Well architected sites won't be affected by failures in a single availability zone.

        I guess that we can now conclude that Amazon Prime music streaming website is not well-architected. (But I always kind of suspected that anyway, given its general level of wonkiness.)

        • by micheas ( 231635 )

          Well architected sites won't be affected by failures in a single availability zone.

          I guess that we can now conclude that Amazon Prime music streaming website is not well-architected. (But I always kind of suspected that anyway, given its general level of wonkiness.)

          While Netflix was running fine (acceptably?) on AWS.

          My understanding is that the first step to dealing with a major outage is to stop running Chaos Monkey on production.

      • Good, or bad, news then: this wasn't a problem with an "availability zone," but rather with 3rd party services.

        So while what you said may be true, generally, in the context it is an incorrect diagnosis, and hilariously arrogant considering.

      • Well architected sites won't be affected by failures in a single availability zone.

        Such well architected sites, that span availability zones, usually fail to see the light of day due to budget constraints.

        But, yea.

    • I was definitely getting errors on the Fidelity site this morning for the first two hours after the market opened. I could still buy and sell stocks and access all my account data, so the Fidelity servers were fine, but the stock research and analysis screens were all toast, and would vary between errors and no data.

  • Comment removed (Score:3, Informative)

    by account_deleted ( 4530225 ) on Wednesday November 25, 2020 @02:39PM (#60765844)
    Comment removed based on user account deletion
    • SiriusXM streaming was out all morning. And yes I believe Ring doorbells were out too....
    • by cusco ( 717999 )

      Wonder if this is why Amazon Music is acting up. Told Alexa to shuffle "My Music" (there are around 1300 songs there). It cycles through the same six each time and says "That's all."

  • Bwhahaha (Score:5, Insightful)

    by lessSockMorePuppet ( 6778792 ) on Wednesday November 25, 2020 @02:39PM (#60765848) Homepage

    :-D fuckers! "Cloud" means, "someone else's computer", not, "magical faerie dust that never fails"

    • Re:Bwhahaha (Score:5, Insightful)

      by ljw1004 ( 764174 ) on Wednesday November 25, 2020 @02:51PM (#60765872)

      :-D fuckers! "Cloud" means, "someone else's computer", not, "magical faerie dust that never fails"

      Cloud also means "someone else has to cancel their thanksgiving plans to fix it, not me."

      • Or... They just tell you that they're looking into it, but because of [insert excuse here] they are currently anticipating a 72-96 hour disruption window, during which all of your systems may continue not to work properly, and there is absolutely nothing you can do about it. Happy thanksgiving. :-)

        • by cusco ( 717999 )

          Not at AWS, which is the cash cow that forced Bezos to pay dividends (they couldn't spend the money fast enough). This is current a SEV1 issue, Jeff would have personally gotten paged when the ticket was generated and I'll guarantee that he's watching the progress and will be interested in the Cause Of Event analysis.

          **Full Disclosure**
          I work at Amazon, but fortunately nothing to do with this.

      • Not in my experience. When your product/service that relies on AWS on the back end goes down, you're getting that call regardless of whether it was your crappy code or AWS being flaky. At least if it's your code, you can actually do something about it. Try telling the C-levels that there's nothing you can do because it's AWS' fault.

      • by Wolfier ( 94144 )

        > Cloud also means "someone else has to cancel their thanksgiving plans to fix it, not me."

        And the problem is...? If that someone else earns a decent salary cancelling Thanksgiving plans, more power to them.

        (Usually those without plans would take these rotations and likely they would earn double/triple on-call hourly pay if something needs fixing)

    • The AWS staff in general is much better than the average IT person, so AWS is down less than most infrastructure is down.

    • No one said never fails. All these cloud providers have uptime guarantees and these outages don't breach those. Now the question is, can *you* do better.

  • by bogaboga ( 793279 ) on Wednesday November 25, 2020 @02:44PM (#60765860)

    ...North Korea or Venezuela...

    From and "Internally leaked memo..."

    "We use these entities to cover up our own incompetencies from whenever we see fit..."

    • Re: (Score:1, Troll)

      by cusco ( 717999 )

      North Korea? The country with a total of a couple hundred mostly-antique servers nationwide, which is 100% dependent on The Great Firewall Of China for connectivity to the Internet, which doesn't have the ability to attract a single competent pen-test instructor for its handful of programming students? Good grief. Isn't the "North Korean super-hacker" trope getting a bit old now?

      • Wow, so how long did you spend in North Korea, anyway?

        That's way better information than is available in public. Heck, your story even contained details about their staffing! The rest of the world can only dream about that level of access to whatever their system is.

        Don't you worry your local contacts with be punished because you just engaged in industrial espionage by telling us those details?

        • by cusco ( 717999 )

          Now what the frack are you babbling about? They buy second-hand servers from China, and their only connection to the greater world is through two fiber lines through China. Until around 2012 their only Internet connectivity was a single T-3 line to Taiwan, (reportedly frequently saturated by Kim's porn habit). That's all public knowledge, has been for years.

  • by Guyle ( 79593 ) on Wednesday November 25, 2020 @03:06PM (#60765902)
    So it's THEIR fault I couldn't give snarky GIF replies via Giphy to my coworkers in Slack this morning? Figures. My day was totally ruined.
    • Re:AHA (Score:4, Funny)

      by 93 Escort Wagon ( 326346 ) on Wednesday November 25, 2020 @03:15PM (#60765918)

      I wish some of my coworkers had run into that same issue this morning...

      Giphy is a blight on humanity. Why any supposedly "professional" tool includes it is beyond my comprehension (seriously - Slack, Teams, wtf?).

      • Younger millennials and Gen Z kids struggle with communicating using things like 'words.' They don't have the tools or skills to communicate ideas like 'I think that plan may not actually play out the way you want due to complicating factors both foreseeable, and not foreseeable. Lets take some time to examine the proposed plan with an eye towards identifying possible points of failure.' Instead, they post a gif of some guy tripping holding a pot of chili.
        • You know how they say a picture says a thousand words. Instead of using so many words, how about you do some math. How many words can a gif with 10 frames say?

          • How many words can a gif with 10 frames say?

            Since the words the gif are "saying" are invariably displayed in large block letters on the gif itself... I'd say between 1 and 4.

          • A picture of something specific can communicate a lot about that specific thing.

            I could write an essay about the horrors of the Vietnam War, but the picture of the naked, burned girl running away from her napalmed village can communicate quite a bit about it as well, and quite viscerally.

            Nevertheless, one should be able to communicate words and feelings like 'I agree' without needing to post a picture of Kermit the Frog flailing around.

  • us-east-1 is the region that always fails. Avoid it if possible!

  • Comment removed based on user account deletion
  • what a pain (Score:5, Funny)

    by renegade600 ( 204461 ) on Wednesday November 25, 2020 @04:47PM (#60766152)

    That explains the error message Alexa was giving. Had to manually operate my alexa smart oven because the cloud was down - first time since I got it. I had to actually learn how to use it if I wanted lunch.

    • by cusco ( 717999 )

      That is the funniest thing I've read all day.

    • by ebvwfbw ( 864834 )

      That explains the error message Alexa was giving. Had to manually operate my alexa smart oven because the cloud was down - first time since I got it. I had to actually learn how to use it if I wanted lunch.

      Now that she's back she's mad at you? You didn't clean the oven, made a mess just like a man would?

  • Shocked, I tell you!
  • by Fly Swatter ( 30498 ) on Wednesday November 25, 2020 @09:11PM (#60766696) Homepage
    Some SITES might be down, but the internet itself is fine. Such a poorly worded title.

    It's like saying a bunch of highways are closed when it is really just a few malls and restaurants closed.

    -Who still puts all their eggs in one basket?
  • I've said it before. I'll say it again: The cloud is NO place to put your (eggs)!

    ANYTHING in the cloud can be considered vulnerable. ANYTHING.

    I see a time when computing migrates back to decentralized, secure servers behind thick firewalls accessed via super-encrypted VPN.

    Why would anyone trust any private thing, like Ring doorbells, or personal Facebook data, or Oracle databases,
    or your home security camera system to ANY cloud based service?
    Besides vulnerability to ever-improving hackers, there is

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.

Working...