Amazon's AWS Logs Its Third Outage this Month, Affecting Slack, Epic Games Store, Asana and More (theverge.com) 66

Posted by msmash on Wednesday December 22, 2021 @09:41AM from the cloud-is-someone-else's-computer dept.

Amazon's crucial web services business AWS is experiencing problems today, with issues affecting services like Slack, Imgur, and the Epic Games store for some users. From a report: It's not looking good if you're working from home, with some Slack users unable to view or upload images, and work management tool Asana also hit by the outages. In an incident update, Slack said its services are "experiencing issues with file uploads, message editing, and other services." Asana says the problems constitute a "major outage," with "many of our users unable to access Asana." Epic Games Store said "Internet services outages" are "affecting logins, library, purchases, etc." It's the third time in as many weeks that problems with AWS have had a significant effect on online services.

Amazon's AWS Logs Its Third Outage this Month, Affecting Slack, Epic Games Store, Asana and More

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 66 Comments Log In/Create an Account

Comments Filter:

One Availability Zone? (Score:3)

by ranton ( 36917 ) writes: on Wednesday December 22, 2021 @09:48AM (#62105731)

It looks like from the story that this only affects one availability zone in one region. Shouldn't it be expected that this is going to happen from time to time, which is why anyone who wants high availability will have resiliency from being in multiple availability zones? I'm not an expert in cloud infrastructure so perhaps someone here can set me straight.

- Re: (Score:3)
  
  by drinkypoo ( 153816 ) writes:
  
  On one hand you're correct, on the other hand the whole reason to use cloud services is someone else handles keeping it running and it seems lame that you have to do special stuff to get that. The whole point was supposed to be that you don't care where it's running... but it isn't running at all.
  - Re:One Availability Zone? (Score:5, Informative)
    
    by aaarrrgggh ( 9205 ) writes: on Wednesday December 22, 2021 @11:59AM (#62106103)
    
    You get exactly what you want to pay for. If you want to pay for redundancy, it is extra. If you want to pay for redundant clouds it is a lot extra.
    
    - Re: (Score:2)
      
      by drinkypoo ( 153816 ) writes:
      
      Sure, I get what it is and why it costs more to get more. But I also understand feeling disappointed that this is the case. Ultimately that is a thing that needs to be done cheaply if anyone is going to be able to have a viable site which can handle load going forwards.
      - Re: (Score:2)
        
        by aaarrrgggh ( 9205 ) writes:
        
        Cloud handles cheap growth well. If you are willing to accept degraded performance and don’t require real-time transaction synchronization then you can get redundancy fairly economically at the same time. It only gets really expensive when the PHBs want things like “seamless customer experience” under all failure modes.
  - Re: (Score:2)
    
    by DarkOx ( 621550 ) writes:
    
    Well there in lines the problem. Its not unstanding the difference between the ASP model, someone else keeps it running, and the cloud model, you still build it and design it but it runs on someone else hardware you (try to) control with clumsy abstract tooling!
    The cloud model is just the old mainframe model. You buy some time to run your stuff but its still on you to make sure your job isn't going to ABEND and while there is a good deal of reliability and redundancy built into a single instance if you rea
- Re: One Availability Zone? (Score:3)
  
  by reanjr ( 588767 ) writes:
  
  I know our disaster plan for a meteor strike on the East coast doesn't necessarily mean we will magically roll over to the West coast. Some things might need to be tweaked to ensure traffic's going to the right place; some services might lose a bit of data, etc.
  So, if we have a meteor strike, we're fine.
  But if the shit goes down every week, we haven't actually tooled for that.
  - - Re: (Score:2)
      
      by aaarrrgggh ( 9205 ) writes:
      
      The key part of GP’s statement is that “you loose some data.” Many, many systems do not have real-time, transaction level redundancy. It adds significant cost, and when you are talking about 50-100ms latency between zones it is nearly impossible. You might sync a batch every second (more likely every minute or hour), but that will still result in a loss of some data when your primary zone crashes.
- Re: (Score:2, Informative)
  
  by AmiMoJo ( 196126 ) writes:
  
  That's spot on. Someone will say they shouldn't trust the cloud, but they probably couldn't build and run better infrastructure themselves for even double what they pay Amazon.
  - Re:One Availability Zone? (Score:5, Informative)
    
    by Viol8 ( 599362 ) writes: on Wednesday December 22, 2021 @10:31AM (#62105877) Homepage
    
    You're right. And wrong. Few companies could roll out a few billion dollars worth of infrastructure like AWS. However plenty could roll out an in house system with 2 failover backups for considerably less than they pay for cloud services long term.
    
    - Re: (Score:2)
      
      by bn-7bc ( 909819 ) writes:
      
      Yea but that would make the bldy bean counters cry about capex vs opex, and no way to "instantly" scale and the CxO types would not get the free cabin trips and dinners (oh sorry I mean fact finding opportunities)
    - - Re: (Score:2)
        
        by MooseTick ( 895855 ) writes:
        
        Any fortune 500 has the money and technical capacity to do everything you listed for cheaper than AWS. Sure, rando company with 100 employees can't match Amazon's services for a reasonable price, but MANY others can.
        
        Re: (Score:2)
        
        by MooseTick ( 895855 ) writes:
        
        I'm actually a CCSP, so I do know a little more about the cloud than the average Slashdotter. That said, I'm no cloud expert. But I've also worked for a Fortune 100 company as well as several other large corporations and know many DO have the capacity to handle their own IT in-house.
        And yes, it is just servers in someone else's datacentre! You can add whatever services and put lipstick on it, but in the end its your data and applications running on someone else's hardware at their facility.
    - Re: (Score:3)
      
      by organgtool ( 966989 ) writes:
      
      For many reasons, I find self-hosting to be an ideal solution. Of course, idealism often isn't allowed to exist in the realm of reality. The biggest problem with building your own redundant, self-hosted system is having multiple databases that are constantly kept in sync and can be easily failed-over. This certainly isn't impossible, but it's often beyond the reach of most small or mid-sized businesses. In that regard, AWS with multiple availability zones is a great, pragmatic solution.
      
      On a related no
  - Re: (Score:2)
    
    by awwshit ( 6214476 ) writes:
    
    > Someone will say they shouldn't trust the cloud, but they probably couldn't build and run better infrastructure themselves for even double what they pay Amazon.
    Why? I've done that several times over.
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
- Re: (Score:2)
  
  by Burdell ( 228580 ) writes:
  
  I'm not sure about this time, but in some past events, the AWS internal tooling has overloaded when there was a significant zone outage, such that all the automation that's supposed to make things HA across zones/regions fails. That of course defeats the purpose of using all the AWS HA stuff, which is why smarter companies are doing hybrid-cloud (mix of cloud and self-managed servers) and/or multi-cloud (mix of AWS+Azure+GPC servers) setups.
- Re: (Score:2)
  
  by PmanAce ( 1679902 ) writes:
  
  No, you are correct, if you want three or four 9s, you need to have different regions available to kick in automatically if your main region goes down. Like in the old days, you had a load balancer with 2 machines and both go down, do you just give up and say the load balancer didn't do its job?
Won't change anything (Score:5, Insightful)

by IWantMoreSpamPlease ( 571972 ) writes: on Wednesday December 22, 2021 @09:55AM (#62105759) Homepage Journal

At least where I work. Upper management is absolutely in love with "the cloud" and no amount of logic/screaming from the IT staff will change their mind.
This is affecting us, and naturally management is screaming at us for a fix.
The solution, you dolts, is to bring it back in-house, where it was running just fine.
But then they won't save money. Well, which do you want? A penny in your pocket, or access to your data?

- Re: Won't change anything (Score:3)
  
  by reanjr ( 588767 ) writes:
  
  At any level of real scale, you'll save wads of cash by not using the cloud.
  - Re: Won't change anything (Score:5, Insightful)
    
    by pagedout ( 1144309 ) writes: on Wednesday December 22, 2021 @10:23AM (#62105843)
    
    We found it to be a mixed bag.
    A straight lift and shift of a medium/large architecture appears to almost always cost more in AWS (unless you are really bad at running a datacenter). We pegged it at about 25% usually. If you refactor to be more cloud friendly you can drive that down into substantial savings presuming you have substantial periods of high and low volumes. If you go whole hog into something more cloud native than it is really random as everything costs even more but if broken down well with high swings you can save a lot.
    A straight lift and shift of a small architecture is just so random its hard to tell. We said they tend to be about even. It can be a good savings in labor if what you have is really standard.
    
    - - Re: (Score:1)
        
        by ebvwfbw ( 864834 ) writes:
        
        > We pegged it at about 25% usually.
        I'd bet you didn't look at things like reserved instances though right? If you're willing to pay to reserve a VM instance for a fix number of years on Azure or AWS, the savings are much larger than 25% (as much as 72% discount on AWS vs. standard on-demand instances) such that buying reserved instances is definitely cheaper than buying physical hardware and maintaining it yourself. The 25% more expensive sounds about right for on-demand instances, but will be way out for RIs. The biggest problem I see with RIs though is that companies constantly tell themselves they don't need them because they'll be going cloud native in the next 6 months so don't want a 1 or 3 year lease. They're kidding themselves though, and 5 years later still haven't usually managed to go cloud native, such that they'd have saved a fortune just using RIs, even if only on rolling 1 year leases.
        This is really why companies making the transition to the cloud would do well to invest in a talented cloud architect; they'll be able to explain how to do it cost effectively, and do it well. The salary of a cloud architect will almost certainly more than pay for itself many times over with the savings you'll make.
        But this is Slashdot, so I'm sure someone will tell me it's a made up job, and I'm wrong, because they once changed their own graphics card and so they know best or something amateur hour like that.
        We hired a guys that used to work for AWS. He flat out told us there is no way AWS could beat an agency's own infrastructure. I think he said the break-even point was 13 machines. If you had 13 or less it paid to go with AWS. More than 13 and you're better off doing it yourself.
        Since then they went all-hog into AWS. Then about 2 years later it was all removed. They had hired a couple of AWS gurus. They still took a significant bath in red ink. So bad a few managers departed with red faces. I understand now
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    Until you need to have resources in other countries. Then the cloud starts getting more attractive.
- Re: (Score:2)
  
  by Murdoch5 ( 1563847 ) writes:
  
  Simple, "We need to have a backup Cloud host!", If you need true High Availability then you need multiple clouds, Azure + AWS or GCP + AWS, etc... The only way to really have stability is full redundancy. "The Cloud" is rarely a way to save money, not if it's setup correctly, because the amount of fall over you require quickly throws the costs WAY up.
- Re: (Score:2)
  
  by PmanAce ( 1679902 ) writes:
  
  The solution is to switch automatically to another region when one goes down, why don't you do that?
- Re: (Score:1)
  
  by Tablizer ( 95088 ) writes:
  
  Upper management is absolutely in love with "the cloud"... and naturally management is screaming at us for a fix [when down].
  Sell them some BozoCoin and then get the hell out.
- Re: (Score:2)
  
  by fermion ( 181285 ) writes:
  
  The internet is not built on the uptime of mainframes. While I understand that some number of teenage suicides will be blamed on the Facebook downtime, most of us can live on 99.5 uptime as will not pay for 4 or 5 nines.
  Professionally, as I depend on constant uptime, these outages are an extreme bother. But any robust system has contingencies. Most people who complain about outages simply will not be bothered to work out the contingencies.
- Re: (Score:2)
  
  by tlhIngan ( 30335 ) writes:
  
  It really depends.
  In-house is great, but the manpower and equipment needed to keep it alive is heavy and can easily outweigh the costs of AWS. When AWS goes down, it takes down a lot, too, but then again, perhaps your infrastructure goes down more often, just that because fewer people use it, it's less noticed.
  We moved email to the cloud, from internal Exchange to Office 365 hosted Exchange. Our IT guy sleeps better at night because chances are, the Office 365 exchange server will be less likely to die over
- Re: (Score:2)
  
  by ayesnymous ( 3665205 ) writes:
  
  You gotta do what all the other companies are doing, otherwise customers/investors/employees will all ditch you.
- Re: Won't change anything (Score:2)
  
  by MrBoring ( 256282 ) writes:
  
  I would also add that you control your own data. If your business isn't one approved by AWS, you could also loose your data.
Beware of low price tags. (Score:2)

by jellomizer ( 103300 ) writes:

If you are a small organization then going to cloud is often a good option, as the occasional outage from a cloud service while annoying is much better then if you tried to have your own budget data center, that costs 10x as much. Companies like Amazon can offer you at such a cheap price, because your demand is rather low, and they can use the computers, and employee resources to manage dozens if not hundreds of customers, so you get a better value.
However if you are a big company like Slack or Epic. Chanc
Down Dectector is down (Score:5, Funny)

by jhecht ( 143058 ) writes: on Wednesday December 22, 2021 @10:21AM (#62105835)

On Downdetector.com: at 10:20 a.m. Eastern Hmm. We’re having trouble finding that site. We can’t connect to the server at downdetetector.com.

You don't say... (Score:4, Insightful)

by Rosco P. Coltrane ( 209368 ) writes: on Wednesday December 22, 2021 @10:22AM (#62105837)

The cloud might not offer the 6-sigma availability and reliability cloud providers promise?
The cloud operated by only a handful of giant cloud providers is at the mercy of any single one of them going tits up?
Why, this is such a surprised. So unexpected and so disappointing...

- Re: (Score:2)
  
  by zaq1xsw2cde9 ( 608119 ) * writes:
  
  I'm not sure it is fair to say "the cloud" doesn't offer a certain reliability. AWS in itself has multiple regions and availability zones to combat this very problem. You can also spread redundancy to other cloud providers to further mitigate risk. Nothing is perfect, but with the proper infrastructure planning, the cloud can provide great reliability. We use AWS's geo-redundancy capabilities, and even though our services are primarily in the region that has been having problems, we have had no downtime
  - Re:You don't say... (Score:5, Insightful)
    
    by Rosco P. Coltrane ( 209368 ) writes: on Wednesday December 22, 2021 @11:00AM (#62105939)
    
    we have had no downtime
    Yet.
    The thing with the cloud is, when everything runs fine, it's great. But when things go sour, for technical reasons or because the cloud provider decides they have you trapped and it's time to put the squeeze on you, that's when you realize that infrastructure you built on theirs is locked-in and completely dependent on network availability and you're hosed, because you put all your eggs in that one basket and you have no plan B.
    Roll-your-on is a lot costlier upfront and you know you'll have problems every once in a while if you don't plan great - and you will even if you do. But least you can do something about it, you're not at the mercy of your internet provider simply to keep your company operating, and you're not some cloud provider's bitch.
    
    - Re: (Score:1)
      
      by zaq1xsw2cde9 ( 608119 ) * writes:
      
      for technical reasons or because the cloud provider decides they have you trapped and it's time to put the squeeze on you, that's when you realize that infrastructure you built on theirs is locked-in and completely dependent on network availability and you're hosed, because you put all your eggs in that one basket and you have no plan B.
      I've been using a few cloud infrastructure companies for many years. AWS since 2009. I can assure you this is a fallacy. There is plenty of competition in the space, and in all this time, I've never felt a "squeeze". I have only found that all services get cheaper and easier every year.
      If you put all your eggs in one basket, that is purely from your own bad planning, and it is no different if you roll your own; It just costs more. If you roll your own, you are still the ISP's bitch, or the Hardware
      - Re: You don't say... (Score:2)
        
        by MrBoring ( 256282 ) writes:
        
        There is competition for some things, but not exactly for things like DynamoDB. AWS offers many custom, non standard services that either are incompatible or don't exist on other clouds. Also pricing to switch and move your data out is where they get you.
        
        Re: (Score:1)
        
        by zaq1xsw2cde9 ( 608119 ) * writes:
        
        That is a straw man argument.
        There are certainly many competitors for Dynamo DB that can work in AWS and also other providers.That is the customer's choice to use those products and decide to tie themselves to AWS. Most AWS products are based on open source products and moving elsewhere would be no problem. For instance, they have been pushing AWS Aurora hard in the past couple of years. This product is a repackaged Postgres or MySQL. Any program that uses Postgres will natively work on Aurora. What
Greed always ends in incompetence (Score:1)

by Anonymous Coward writes:

this is exactly what happens when we let the upper class corrupt everything, shit stops working and civilization collapses again, and always for the exact same reason, unmigitigated greed and an out of control, unsustainable and incompetent upper class
history repeating itself again, will we never learn?
God bless the cloud (Score:5, Insightful)

by leathered ( 780018 ) writes: on Wednesday December 22, 2021 @10:43AM (#62105901)

As a sysadmin, whenever something on-prem broke, I'd have the PHBs in my office breathing down my neck telling me that the downtime was unacceptable. Now we've put most thing into the cloud, despite downtime and outages going way up, I can now just shrug my shoulders and blame it on the hosting company. Even if it was something I broke.
Win-win.

- Re: (Score:2)
  
  by Tablizer ( 95088 ) writes:
  
  You're on Cloud 9
- Re: (Score:2)
  
  by e3m4n ( 947977 ) writes:
  
  its amazing how they wont accept problems that arise when its run by just 1 or a few admins, its deemed entirely unacceptable. But when they outsource it to a company with hundreds, if not thousands of admins, suddenly its an unavoidable circumstance. Ive seen people waste hours of support time demanding an explanation why a call dropped from their desk voip handset, meanwhile they could drop 2 or 3 calls a day on their cell, and they shrug it off and call right back. They arent even willing to consider tha
- Re:God bless the cloud (Score:5, Insightful)
  
  by organgtool ( 966989 ) writes: on Wednesday December 22, 2021 @12:56PM (#62106265)
  
  This is one of the biggest reasons why outsourcing is so popular in both the private and public sectors. It's not just that you're outsourcing the services to an organization that specializes at performing that particular service, it's outsourcing the blame when that service inevitably breaks.
  
Pffft (Score:1)

by Tablizer ( 95088 ) writes:

I toldja you should have used Win~ &^ #n` [NO CARRIER]
Pentagon Contract (Score:2)

by e3m4n ( 947977 ) writes:

gee, I wonder why they keep getting passed over for government contracts. This sort of thing does draw attention as to design and resiliency, even if its not apples-to-apples. The people at the top make decisions based on perception more than anything. The perception is that AWS is not up to the challenges of a network that can never fail for any reason. Even a 2 minute outage during a skirmish or battle can change the entire outcome. Maybe AWS design is more than sufficient given the lower volume compared
- Re: (Score:3)
  
  by Entrope ( 68843 ) writes:
  
  They got passed over for JEDI mostly because Donald Trump hates Jeff Bezos for owning the Washington Post (and I say that as someone who voted for Trump last year).
  That's a big part of why the competition had to be re-opened. Amazon operates the US government's Top Secret cloud that was the inspiration for JEDI, and just expanded that to a second region: https://aws.amazon.com/blogs/p... [amazon.com]
- Re:Pentagon Contract (Score:4, Interesting)
  
  by david.emery ( 127135 ) writes: on Wednesday December 22, 2021 @12:03PM (#62106117)
  
  Sigh... I'm not sure whether I should be amused or disgusted when people who don't know anything about military procurement or military operations opine about them.
  The procurement system is based on evaluation against a set of requirements. If they write the wrong requirements, they get the wrong results. There's very little room in the evaluation for the exercise of independent judgement.
  In operations, though, there's a lot of room for constructing actual systems against The Real World (tm). Sometimes the requirements really are helpful, and you get systems that you can integrate as expected. Other times, you have to do a lot of work to fit the round peg into a square hole, -usually- because the shape of the hole changed from the time the requirements were written to the time the resulting system was delivered.
  There's always a big conflict/trade-off between 'requirements' and 'simplicity'. Do you want a complex system that might well have holes in it, where mistakes result in downtime (or worse, wrong answers. See https://en.wikipedia.org/wiki/... [wikipedia.org] and in particularhttps://en.wikipedia.org/wiki/Byzantine_fault ). Personally, I've always had a bias towards simpler systems with much higher dependability, but often the people writing the requirements want more complexity in the system so the usage of that system is simpler. These are NOT SIMPLE TRADES.
  But one thing I've learned about trying to reason about distributed systems over 40 years. Communications is the weakest part of the distributed system in military operations. We do not have the dependency of fiber-optic communications in most (but not all) combat systems. When the radios don't work, then you can't use anything that is not on your vehicle, aircraft, local installation, etc. Years ago, we knew how to reason about failures in distributed systems (as defined by Leslie Lamport, "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." https://www.cs.ubc.ca/~bestcha... [cs.ubc.ca] .) These days, I'm not sure how much that is actually taught.
  
  - Re: (Score:3)
    
    by e3m4n ( 947977 ) writes:
    
    it really depends on the system. Dont you remember when a MS Executive convinced a Navy Admiral to put Windows NT in their CIC (Combat Information Center) on the USS New Jersey? (I beleive it was the New Jersey, it was the last battleship still in service). It had to get towed to shore. Then, if that wasnt enough, it happened again in 1998 USS Yorktown CG-48. https://www.wired.com/1998/07/... [wired.com] . None of this seems appropriate or well vetted for an in-service Navy vessel that can be called upon at any time. C
Can't wait! (Score:1)

by opps sorry my bad ( 6897492 ) writes:

Can't wait until AWS has a much bigger outage. I have the popcorn standing by!
Any US-EAST AZ is asking for trouble. (Score:2)

by Virtucon ( 127420 ) writes:

There's too much hanging on any of the US-EAST Zones, the only big reason for using them is zero cost ingress of network data and they're usually the first zones to get new features.
- Re: (Score:1)
  
  by bhiestand ( 157373 ) writes:
  
  us-east-2 isn't so bad. us-east-1 had better latency from most of South America than sa-east-1 last I checked, so there's also that.
  But yeah, it's good advice to not put too many eggs in the us-east-1 basket.
Any day now... (Score:1)

by Narcocide ( 102829 ) writes:

... people are going to start to realize I was right when I said that "cloud computing" the way these jackasses are selling it doesn't buy you any magical baked-in redundancy. (Should have hired me instead, I guess, huh fuckers?)
system design (Score:1)

by inline_four ( 594390 ) writes:

A lot of comments are critical of overuse of the clouds or AWS specifically. But for me it boils down to a couple of more specific issues that can be improved upon.
The first thing is within the customers' grasp: let's think critically about our system design in terms of third parties. The more consolidation there is in the SaaS industry, the more we'll see business' interdependence. In other words, if there is a really compelling offering for managed services that run outside your immediate control, ther

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

One Availability Zone? (Score:3)

Re: (Score:3)

Re:One Availability Zone? (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: One Availability Zone? (Score:3)

Re: (Score:2)

Re: (Score:2, Informative)

Re:One Availability Zone? (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Won't change anything (Score:5, Insightful)

Re: Won't change anything (Score:3)

Re: Won't change anything (Score:5, Insightful)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Won't change anything (Score:2)

Beware of low price tags. (Score:2)

Down Dectector is down (Score:5, Funny)

You don't say... (Score:4, Insightful)

Re: (Score:2)

Re:You don't say... (Score:5, Insightful)

Re: (Score:1)

Re: You don't say... (Score:2)

Re: (Score:1)

Greed always ends in incompetence (Score:1)

God bless the cloud (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:God bless the cloud (Score:5, Insightful)

Pffft (Score:1)

Pentagon Contract (Score:2)

Re: (Score:3)

Re:Pentagon Contract (Score:4, Interesting)

Re: (Score:3)

Can't wait! (Score:1)

Any US-EAST AZ is asking for trouble. (Score:2)

Re: (Score:1)

Any day now... (Score:1)

system design (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals