

Amazon's AWS Logs Its Third Outage this Month, Affecting Slack, Epic Games Store, Asana and More (theverge.com) 66
Amazon's crucial web services business AWS is experiencing problems today, with issues affecting services like Slack, Imgur, and the Epic Games store for some users. From a report: It's not looking good if you're working from home, with some Slack users unable to view or upload images, and work management tool Asana also hit by the outages. In an incident update, Slack said its services are "experiencing issues with file uploads, message editing, and other services." Asana says the problems constitute a "major outage," with "many of our users unable to access Asana." Epic Games Store said "Internet services outages" are "affecting logins, library, purchases, etc." It's the third time in as many weeks that problems with AWS have had a significant effect on online services.
One Availability Zone? (Score:3)
It looks like from the story that this only affects one availability zone in one region. Shouldn't it be expected that this is going to happen from time to time, which is why anyone who wants high availability will have resiliency from being in multiple availability zones? I'm not an expert in cloud infrastructure so perhaps someone here can set me straight.
Re: (Score:3)
On one hand you're correct, on the other hand the whole reason to use cloud services is someone else handles keeping it running and it seems lame that you have to do special stuff to get that. The whole point was supposed to be that you don't care where it's running... but it isn't running at all.
Re:One Availability Zone? (Score:5, Informative)
You get exactly what you want to pay for. If you want to pay for redundancy, it is extra. If you want to pay for redundant clouds it is a lot extra.
Re: (Score:2)
Sure, I get what it is and why it costs more to get more. But I also understand feeling disappointed that this is the case. Ultimately that is a thing that needs to be done cheaply if anyone is going to be able to have a viable site which can handle load going forwards.
Re: (Score:2)
Cloud handles cheap growth well. If you are willing to accept degraded performance and don’t require real-time transaction synchronization then you can get redundancy fairly economically at the same time. It only gets really expensive when the PHBs want things like “seamless customer experience” under all failure modes.
Re: (Score:2)
Well there in lines the problem. Its not unstanding the difference between the ASP model, someone else keeps it running, and the cloud model, you still build it and design it but it runs on someone else hardware you (try to) control with clumsy abstract tooling!
The cloud model is just the old mainframe model. You buy some time to run your stuff but its still on you to make sure your job isn't going to ABEND and while there is a good deal of reliability and redundancy built into a single instance if you rea
Re: One Availability Zone? (Score:3)
I know our disaster plan for a meteor strike on the East coast doesn't necessarily mean we will magically roll over to the West coast. Some things might need to be tweaked to ensure traffic's going to the right place; some services might lose a bit of data, etc.
So, if we have a meteor strike, we're fine.
But if the shit goes down every week, we haven't actually tooled for that.
Re: (Score:2)
The key part of GP’s statement is that “you loose some data.” Many, many systems do not have real-time, transaction level redundancy. It adds significant cost, and when you are talking about 50-100ms latency between zones it is nearly impossible. You might sync a batch every second (more likely every minute or hour), but that will still result in a loss of some data when your primary zone crashes.
Re: (Score:2, Informative)
That's spot on. Someone will say they shouldn't trust the cloud, but they probably couldn't build and run better infrastructure themselves for even double what they pay Amazon.
Re:One Availability Zone? (Score:5, Informative)
You're right. And wrong. Few companies could roll out a few billion dollars worth of infrastructure like AWS. However plenty could roll out an in house system with 2 failover backups for considerably less than they pay for cloud services long term.
Re: (Score:2)
Re: (Score:2)
Any fortune 500 has the money and technical capacity to do everything you listed for cheaper than AWS. Sure, rando company with 100 employees can't match Amazon's services for a reasonable price, but MANY others can.
Re: (Score:2)
I'm actually a CCSP, so I do know a little more about the cloud than the average Slashdotter. That said, I'm no cloud expert. But I've also worked for a Fortune 100 company as well as several other large corporations and know many DO have the capacity to handle their own IT in-house.
And yes, it is just servers in someone else's datacentre! You can add whatever services and put lipstick on it, but in the end its your data and applications running on someone else's hardware at their facility.
Re: (Score:3)
On a related no
Re: (Score:2)
> Someone will say they shouldn't trust the cloud, but they probably couldn't build and run better infrastructure themselves for even double what they pay Amazon.
Why? I've done that several times over.
Re: (Score:2)
Re: (Score:2)
I'm not sure about this time, but in some past events, the AWS internal tooling has overloaded when there was a significant zone outage, such that all the automation that's supposed to make things HA across zones/regions fails. That of course defeats the purpose of using all the AWS HA stuff, which is why smarter companies are doing hybrid-cloud (mix of cloud and self-managed servers) and/or multi-cloud (mix of AWS+Azure+GPC servers) setups.
Re: (Score:2)
Won't change anything (Score:5, Insightful)
At least where I work. Upper management is absolutely in love with "the cloud" and no amount of logic/screaming from the IT staff will change their mind.
This is affecting us, and naturally management is screaming at us for a fix.
The solution, you dolts, is to bring it back in-house, where it was running just fine.
But then they won't save money. Well, which do you want? A penny in your pocket, or access to your data?
Re: Won't change anything (Score:3)
At any level of real scale, you'll save wads of cash by not using the cloud.
Re: Won't change anything (Score:5, Insightful)
We found it to be a mixed bag.
A straight lift and shift of a medium/large architecture appears to almost always cost more in AWS (unless you are really bad at running a datacenter). We pegged it at about 25% usually. If you refactor to be more cloud friendly you can drive that down into substantial savings presuming you have substantial periods of high and low volumes. If you go whole hog into something more cloud native than it is really random as everything costs even more but if broken down well with high swings you can save a lot.
A straight lift and shift of a small architecture is just so random its hard to tell. We said they tend to be about even. It can be a good savings in labor if what you have is really standard.
Re: (Score:1)
> We pegged it at about 25% usually.
I'd bet you didn't look at things like reserved instances though right? If you're willing to pay to reserve a VM instance for a fix number of years on Azure or AWS, the savings are much larger than 25% (as much as 72% discount on AWS vs. standard on-demand instances) such that buying reserved instances is definitely cheaper than buying physical hardware and maintaining it yourself. The 25% more expensive sounds about right for on-demand instances, but will be way out for RIs. The biggest problem I see with RIs though is that companies constantly tell themselves they don't need them because they'll be going cloud native in the next 6 months so don't want a 1 or 3 year lease. They're kidding themselves though, and 5 years later still haven't usually managed to go cloud native, such that they'd have saved a fortune just using RIs, even if only on rolling 1 year leases.
This is really why companies making the transition to the cloud would do well to invest in a talented cloud architect; they'll be able to explain how to do it cost effectively, and do it well. The salary of a cloud architect will almost certainly more than pay for itself many times over with the savings you'll make.
But this is Slashdot, so I'm sure someone will tell me it's a made up job, and I'm wrong, because they once changed their own graphics card and so they know best or something amateur hour like that.
We hired a guys that used to work for AWS. He flat out told us there is no way AWS could beat an agency's own infrastructure. I think he said the break-even point was 13 machines. If you had 13 or less it paid to go with AWS. More than 13 and you're better off doing it yourself.
Since then they went all-hog into AWS. Then about 2 years later it was all removed. They had hired a couple of AWS gurus. They still took a significant bath in red ink. So bad a few managers departed with red faces. I understand now
Re: (Score:1)
Until you need to have resources in other countries. Then the cloud starts getting more attractive.
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Sell them some BozoCoin and then get the hell out.
Re: (Score:2)
Professionally, as I depend on constant uptime, these outages are an extreme bother. But any robust system has contingencies. Most people who complain about outages simply will not be bothered to work out the contingencies.
Re: (Score:2)
It really depends.
In-house is great, but the manpower and equipment needed to keep it alive is heavy and can easily outweigh the costs of AWS. When AWS goes down, it takes down a lot, too, but then again, perhaps your infrastructure goes down more often, just that because fewer people use it, it's less noticed.
We moved email to the cloud, from internal Exchange to Office 365 hosted Exchange. Our IT guy sleeps better at night because chances are, the Office 365 exchange server will be less likely to die over
Re: (Score:2)
Re: Won't change anything (Score:2)
I would also add that you control your own data. If your business isn't one approved by AWS, you could also loose your data.
Beware of low price tags. (Score:2)
If you are a small organization then going to cloud is often a good option, as the occasional outage from a cloud service while annoying is much better then if you tried to have your own budget data center, that costs 10x as much. Companies like Amazon can offer you at such a cheap price, because your demand is rather low, and they can use the computers, and employee resources to manage dozens if not hundreds of customers, so you get a better value.
However if you are a big company like Slack or Epic. Chanc
Down Dectector is down (Score:5, Funny)
You don't say... (Score:4, Insightful)
The cloud might not offer the 6-sigma availability and reliability cloud providers promise?
The cloud operated by only a handful of giant cloud providers is at the mercy of any single one of them going tits up?
Why, this is such a surprised. So unexpected and so disappointing...
Re: (Score:2)
Re:You don't say... (Score:5, Insightful)
we have had no downtime
Yet.
The thing with the cloud is, when everything runs fine, it's great. But when things go sour, for technical reasons or because the cloud provider decides they have you trapped and it's time to put the squeeze on you, that's when you realize that infrastructure you built on theirs is locked-in and completely dependent on network availability and you're hosed, because you put all your eggs in that one basket and you have no plan B.
Roll-your-on is a lot costlier upfront and you know you'll have problems every once in a while if you don't plan great - and you will even if you do. But least you can do something about it, you're not at the mercy of your internet provider simply to keep your company operating, and you're not some cloud provider's bitch.
Re: (Score:1)
for technical reasons or because the cloud provider decides they have you trapped and it's time to put the squeeze on you, that's when you realize that infrastructure you built on theirs is locked-in and completely dependent on network availability and you're hosed, because you put all your eggs in that one basket and you have no plan B.
I've been using a few cloud infrastructure companies for many years. AWS since 2009. I can assure you this is a fallacy. There is plenty of competition in the space, and in all this time, I've never felt a "squeeze". I have only found that all services get cheaper and easier every year.
If you put all your eggs in one basket, that is purely from your own bad planning, and it is no different if you roll your own; It just costs more. If you roll your own, you are still the ISP's bitch, or the Hardware
Re: You don't say... (Score:2)
There is competition for some things, but not exactly for things like DynamoDB. AWS offers many custom, non standard services that either are incompatible or don't exist on other clouds. Also pricing to switch and move your data out is where they get you.
Re: (Score:1)
There are certainly many competitors for Dynamo DB that can work in AWS and also other providers.That is the customer's choice to use those products and decide to tie themselves to AWS. Most AWS products are based on open source products and moving elsewhere would be no problem. For instance, they have been pushing AWS Aurora hard in the past couple of years. This product is a repackaged Postgres or MySQL. Any program that uses Postgres will natively work on Aurora. What
Greed always ends in incompetence (Score:1)
this is exactly what happens when we let the upper class corrupt everything, shit stops working and civilization collapses again, and always for the exact same reason, unmigitigated greed and an out of control, unsustainable and incompetent upper class
history repeating itself again, will we never learn?
God bless the cloud (Score:5, Insightful)
As a sysadmin, whenever something on-prem broke, I'd have the PHBs in my office breathing down my neck telling me that the downtime was unacceptable. Now we've put most thing into the cloud, despite downtime and outages going way up, I can now just shrug my shoulders and blame it on the hosting company. Even if it was something I broke.
Win-win.
Re: (Score:2)
You're on Cloud 9
Re: (Score:2)
Re:God bless the cloud (Score:5, Insightful)
Pffft (Score:1)
I toldja you should have used Win~ &^ #n` [NO CARRIER]
Pentagon Contract (Score:2)
Re: (Score:3)
They got passed over for JEDI mostly because Donald Trump hates Jeff Bezos for owning the Washington Post (and I say that as someone who voted for Trump last year).
That's a big part of why the competition had to be re-opened. Amazon operates the US government's Top Secret cloud that was the inspiration for JEDI, and just expanded that to a second region: https://aws.amazon.com/blogs/p... [amazon.com]
Re:Pentagon Contract (Score:4, Interesting)
Sigh... I'm not sure whether I should be amused or disgusted when people who don't know anything about military procurement or military operations opine about them.
The procurement system is based on evaluation against a set of requirements. If they write the wrong requirements, they get the wrong results. There's very little room in the evaluation for the exercise of independent judgement.
In operations, though, there's a lot of room for constructing actual systems against The Real World (tm). Sometimes the requirements really are helpful, and you get systems that you can integrate as expected. Other times, you have to do a lot of work to fit the round peg into a square hole, -usually- because the shape of the hole changed from the time the requirements were written to the time the resulting system was delivered.
There's always a big conflict/trade-off between 'requirements' and 'simplicity'. Do you want a complex system that might well have holes in it, where mistakes result in downtime (or worse, wrong answers. See https://en.wikipedia.org/wiki/... [wikipedia.org] and in particularhttps://en.wikipedia.org/wiki/Byzantine_fault ). Personally, I've always had a bias towards simpler systems with much higher dependability, but often the people writing the requirements want more complexity in the system so the usage of that system is simpler. These are NOT SIMPLE TRADES.
But one thing I've learned about trying to reason about distributed systems over 40 years. Communications is the weakest part of the distributed system in military operations. We do not have the dependency of fiber-optic communications in most (but not all) combat systems. When the radios don't work, then you can't use anything that is not on your vehicle, aircraft, local installation, etc. Years ago, we knew how to reason about failures in distributed systems (as defined by Leslie Lamport, "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." https://www.cs.ubc.ca/~bestcha... [cs.ubc.ca] .) These days, I'm not sure how much that is actually taught.
Re: (Score:3)
Can't wait! (Score:1)
Can't wait until AWS has a much bigger outage. I have the popcorn standing by!
Any US-EAST AZ is asking for trouble. (Score:2)
There's too much hanging on any of the US-EAST Zones, the only big reason for using them is zero cost ingress of network data and they're usually the first zones to get new features.
Re: (Score:1)
us-east-2 isn't so bad. us-east-1 had better latency from most of South America than sa-east-1 last I checked, so there's also that.
But yeah, it's good advice to not put too many eggs in the us-east-1 basket.
Any day now... (Score:1)
... people are going to start to realize I was right when I said that "cloud computing" the way these jackasses are selling it doesn't buy you any magical baked-in redundancy. (Should have hired me instead, I guess, huh fuckers?)
system design (Score:1)
A lot of comments are critical of overuse of the clouds or AWS specifically. But for me it boils down to a couple of more specific issues that can be improved upon.
The first thing is within the customers' grasp: let's think critically about our system design in terms of third parties. The more consolidation there is in the SaaS industry, the more we'll see business' interdependence. In other words, if there is a really compelling offering for managed services that run outside your immediate control, ther