


Amazon Outage Cost S&P 500 Companies $150M (axios.com) 113
From a report on Axios: Cyence, an economic modeling platform, shared some data with Axios that show the ramifications: Losses of $150 million for S&P 500 companies. Losses of $160 million for U.S. financial services companies using the infrastructure.
Maybe you should own your hardware (Score:1, Insightful)
Re: (Score:3, Insightful)
Re:Maybe you should own your hardware (Score:4, Insightful)
Re: (Score:2)
If your cloud hosting partner fucks up, it's breach of contract and not your fault.
That turns out not to be the case. Anything you do as a senior executive is your responsibility, and if it turns out badly for the corporation it's your head on the block.
That's why there used to be a saying, "No one ever got fired for buying IBM". The clear implication was that you could well be fired for buying from some other vendor. IBM was unique, both because it swung enough weight to rescue anyone who got into trouble for choosing its products and services, and because it was always best chums with t
Re: (Score:2)
That's why there used to be a saying, "No one ever got fired for buying IBM". The clear implication was that you could well be fired for buying from some other vendor. IBM was unique, both because it swung enough weight to rescue anyone who got into trouble for choosing its products and services, and because it was always best chums with the CEO and his inner circle.
I think the "No one every got fired for buying IBM" saying was more about going with the herd. It wasn't that IBM was foolproof or could rescue you in ways that other people couldn't, it was that IBM was widely accepted as a very solid choice, and if you were wrong to go with IBM then so were millions of others.
Re: (Score:3)
If you ever get to management and you have to answer for errors of your subordinates your opinion will change.
And by outsourcing your critical IT resources and eliminating subordinate positions, that will make it that much more obvious where the blame should go.
Re: (Score:1)
Re: (Score:2)
That's how business works today. Cheaper, faster, and external.
If you don't like it you are welcome to play the startup game.
If, of course, you can raise the capital.
Enough capital, in fact, to overcome the fact that established players will probably be paying much less for virtually everything than you will.
Re: If you don't like it play startup (Score:1)
Re: (Score:1)
Re:Maybe you should own your hardware (Score:5, Insightful)
Yeah because self hosted hardware never goes down. Totally rock solid. I don't know why everyone doesn't host their own stuff so that nothing can go wrong.
Re: (Score:2)
I understand what you mean.
However... The more you have your stuff together the easier it is to reach absurdly high levels of availability at affordable costs. Automatic host fail over, automatic site fail over, etc...
Then again not many have.
Re: (Score:2)
How nice. And when the employee that put all that together leaves to company for greener pastures or to pursue his dreams and when you have to replace him on short notice, that setup falls apart. Likewise, when you suddenly need to quadruple your capacity because of some business decision, you lack the staff and resourc
Re: (Score:2)
How nice. And when the employee that put all that together leaves to company for greener pastures or to pursue his dreams and when you have to replace him on short notice, that setup falls apart. Likewise, when you suddenly need to quadruple your capacity because of some business decision, you lack the staff and resources to do so quickly.
The nice thing about Amazon is that it is predictable, low (not zero) risk, and scalable.
I said "The more you have your stuff together". That means you have removed the bus effect as a factor. Not having your stuff together means the market forces will efficiently deal with you sooner or later.
Re: (Score:2)
If you run your own IT shop, you necessarily have a much smaller pool of IT staff than Amazon. That means that your risk of losing an employee who is key for keeping your systems running is necessarily much larger than for Amazon. If you don't understand that, you certainly "don't have your stuff together".
Re: (Score:2)
If you run your own IT shop, you necessarily have a much smaller pool of IT staff than Amazon. That means that your risk of losing an employee who is key for keeping your systems running is necessarily much larger than for Amazon. If you don't understand that, you certainly "don't have your stuff together".
Either you don't get it or you don't want to get it.
Well setup thought of systems require little staff. But you must be prepared to go the whole 9 yards during development phase. Resulting systems behave reasonably and predictively with respect to resources (CPU, memory, storage, networking bandwidth, etc...) The function of required staff vs. workload should be asymptotic and mustn't be linear and certainly not exponential. The task of the system administrator must be extremely boring (backups, restorin
Re: (Score:2)
No, you don't get it. You say that if a company has great people, it can achieve Amazon-like stability. It can. But the problem is that great people are hard to find, so when they retire or leave, you are left with a problem on your hand.
Solid businesses don't rely on technical or managerial superstars, they have business processes that function reliably with mediocre technical staff and managers.
Re: (Score:1)
Right up until someone turns it all off like what happened to Amazon...
Re: (Score:2)
However... The more you have your stuff together
Nope. The more you are an expert in managing your stuff the easier it is to reach absurdly high levels of availability. The vast majority of Amazon Cloud users would by themselves be very unlikely to be able to reach the uptime and availability much less the scalable bandwidth available through that kind of hosting.
Re: (Score:2)
It's also perfectly possible for a large enough organization to run separate datacenters at facilities in different geograp
Re: (Score:2)
Mind you, there's only one local admin left that knows how to read the chicken bones and tea leaves to run the thing...
Which is because no one in a position of power chose to invest in training more people to replace him. Please don't tell me it's impossible; with enough money and commitment, most things are possible.
What isn't possible is to get a result when you are too stingy to provide the necessary means.
Re: (Score:2)
At my last job, our AS/400 had to have all of the applications shut down to do a nightly backup. The backup took nearly every second that the business was closed. Scheduled maintenance had to be done on holidays.
One time we had to move its network connection to another switch port. The thing didn't work again until we hard rebooted it.
The software on it could only be accessed from the network by running CL scripts - so there was no such thing as transactional integrity. The programmers used a five digit bat
Re: (Score:2)
Our locally-hosted AS/400 has not had an unscheduled outage in something like fifteen years, and that includes at least one full hardware migration
It's hard to find Admins who know how to have high availability in their own datacenter. AWS wins because of the lack of expertise in the world.
bad original greybeard (Score:2)
I guess you never heard that those "faster than light" neutrinos were not a thing.
Not that any sane greybeard of yore would couple the network stack directly into the wal
Re: (Score:1)
The money you save with going to the cloud allows you to spend a couple of bucks a month to get a backup internet connection.
Re: (Score:2)
If you outsource, you can blame the service provider. If you do it internally, you take the blame yourself. No wonder the cloud is so popular.
Re: (Score:3)
We felt the effect of the Amazon issue through a service that we've contracted-for. That service provider gets no special consideration in our judgement of them just because the entity they subbed-out to went down.
Re: (Score:2)
Re: (Score:2)
...[M]anagement seems to be more about avoiding blame than providing solutions.
Which is a far bigger problem than some AWS systems going down for a few hours. If management really is more about avoiding blame than making the organization successful, success will prove very elusive indeed.
And of course if management is working in the wrong way... that too is a management problem.
It's not as easy as people think.
Re: (Score:2)
At my previous job at a Fortune 100 company...
Me: Hey boss, we spend half our time cleaning up the mess that is caused by this one bug. I suggest we put a little time into fixing the bug. Boss: Fixing the bug is build work and that requires a business unit to provide a request and the capital to do the work. Cleaning up the mess is maintenance work and the whole company pays for that. So, until some other department pays us to fix this problem, we must continue to put our time into maintenance work. Me: But
Re: (Score:2)
Nope. The customers were down too and never noticed. Granma was wrong - put all your eggs in the biggest damn basket you can find. You may lose all your eggs when the basket goes nuclear, but Joe public will have bigger things than eggs on his mind!
(Nuclear baskets are really scary - take it from me!)
Re: (Score:2)
Re: (Score:2)
None of that ever happened, did it?
Yes, it did. Because I had Fujitsu and later Sony on my resume, I kept getting phone calls from recruiters for Japanese-speaking positions for years. Working at a Japanese company doesn't mean I can speak Japanese. I told that to a hiring manager who called from Tokyo.
Re: (Score:2)
True, but anything from the extra staff to your data center flooding would. And if you total all of that up across S&P 500 companies, you likely end up with bigger total losses.
In different words, your advice is penny wise and pound foolish.
Re:Maybe you should own your hardware (Score:4, Insightful)
Yeah. Do you also
* Run your own communications system with 2-way radios, or do you trust telcos for that?
* Run your own wires to every customer, or do you trust ISPs for that?
* Run your own fleet of trucks to deliver product, or do you trust shipping cos for that?
* Have all you customers pay you directly in cash that you keep in your own vault, or do you trust credit card companies and banks for that?
* Perform all your own accounting, or do you trust outside accountants for that?
The list goes on and on. Every one of those is at least as important as servers (and in some cases they are far more important)
Re: (Score:3)
Finally someone on Slashdot that gets it. In many cases people don't outsource because they are cheap, they outsource because other people are better at it than they ever were.
Now if only non-important small companies like airlines would upgrade to Amazon's cloud, then we can stop running weekly stories about companies grinding to a halt* because their inhouse services are falling over.
*Okay it's not that simple but I hope I'm getting the point across. Most people using Amazon's servers would not have the u
Re: Maybe you should own your hardware (Score:1)
Re: (Score:1)
Exactly! I haven't made a mistake in 20 years, maybe 30, and have an uptime so long that I use a pitch drop experiment to measure it. Both ways.
Skip the summary next time... (Score:2)
Re: (Score:2)
Summary is actually the entire article. I'm absolutely blown away. I guess I shouldn't be, but holy shit. How did an article with no content get linked to?
Re: (Score:2)
This appears to be nothing but opportunistic marketing BS from Cyence.
Re: (Score:2)
Yeah, nothing about how they got to that number? Did they consider that while there was certainly business which didn't happen during the outage, it may have simply been time-shifted to a few hours later?
This appears to be nothing but opportunistic marketing BS from Cyence.
Indeed, this was my immediate thought upon seeing the headline. A temporary loss of $150million that got rectified an hour later when the systems came back online isn't a big deal.
Your subject says it all - No need for comment (Score:2)
Posting a comment that says no more than the subject would be silly. No need for a one-line summary.
Re: What I wonder is.... (Score:1)
They use MS Azure.
Re: (Score:2)
Re: (Score:2)
probably cause it was replicated to all regions unlike some of the data that was only in the region affected cause customers didn't want to pay more $$$
Re: (Score:1)
well, that`s also why they could not even mark their own services as down: the caching layer still had the latest version available, but it could no longer update.
Re: (Score:2)
Why wasn't Amazon's website down when all of the others were? Isn't their cloud good enough to host their own website? Or do they keep their website on someone else's cloud, because that's the cool thing to do these days?
Because Amazon's site was properly set up in multiple regions, like anyone who has mission critical applications online should do. This is just a recent example of why you need to host a site which requires high availability in multiple data centers in multiple regions, because no data center will ever be able to guarantee 100% up time over a long period of time. Cut corners at your own risk.
Negligence? (Score:1)
If Amazon can be considered negligent by failing to put a competent person in charge of whatever operation it was that caused the outage, companies should be able to recover lost revenue and profit from Amazon.
Contractual indemnity does not shield against negligence.
Re: (Score:1)
As S3 was down for more than 44 minutes but less than 7h18 (about 5 hours total), a monthly rebate of 10% is supposed to be applied to everyone`s S3 related fees for February. That engineer which pressed the DELETE button has caused quite a bill..
Really? (Score:2)
I think that money was just never made. It didn't cost them anything, other than not meeting earnings expectations.
It only cost them money if they spent something.
Re: (Score:2)
That's not true! I don't even use Amazon's systems but I suffered a loss of $150K!
Amazon, please send 117 Bitcoins to 1LHuLKyHDndUdjgKUsmfAG8tDnXZ5fTuUA to compensate for my imagined losses. Thank you.
Re:Really? (Score:5, Insightful)
I think it's even more overstated than that.
Without having any indicator other than that link to an article a couple of lines long, we have no info.
Is the $150 million value the "normal throughput of transactions during the regular operation of that same time frame that the outage occurred? Because if so, I highly doubt they lost that much. I tried to place an order somewhere during that outage. There was an error. So i tried again later and placed my order. The company lost nothing in regards to my order. I'm sure mine is not the only transaction that was not re-tried later on.
Bold statements about what an outage costs are not helpful unless the methodology for calculating that cost is both divulged and reasonably calculated.
Re: (Score:2)
I tend to agree with you. Particularly when it comes to folks like the RIAA and MPAA talking about "losses" due to copyright infringement. That's clearly a case of theoretical profits that they didn't take. Would be great if I could write off my theoretical profits as losses on my taxes!
But in this case they may well have spent money. Expenses and costs tend to be there regardless of whether you're making money. So it's likely that these companies had pretty high money outflow which was not making a re
Re: (Score:2)
He has ordered all of this "cloud" nonsense to be banned, as not Great Enough for America.
I thought Trump blamed Obama for the outage?
Re: (Score:2)
Hillary wiped the server.
Re: Equating to money (Score:1)
Meh (Score:5, Insightful)
We hear this sort of statistic a lot but I have to ask, did they REALLY?
Anyone with experience with this sort of thing understand how fluffy these numbers are, based on statistics, some WAG, etc.
For example:
We processed $1 million orders per hour.
We were down for 3 hours.
Ergo we "lost" $3 million.
In fact, no such thing is true. At least, not like someone poured $3 million in cash into a furnace and actually LOST the money.
First, there's the missed opportunity sales. What you're talking about in fact is purchases that didn't take place because the seller wasn't available. This has everything to do with flexibility of supply and time-sensitivity of delivery. If in fact John Smith wanted to order shoes from Amazon, and Amazon was down, so he went to company XYZ and bought those shoes or decided not to buy at all, then in fact is is reasonably a "lost sale" for Amazon. HOWEVER, if John couldn't reach XYZ (not unlikely with the broad infrastructure hit that the outage caused), or they didn't have his brand, or he just said "ok, I'll just buy them tomorrow" it WASN'T a lost sale at all. And it's HIGHLY unlikely that the consultants throwing together these figures rationalized any later excess demand back into the 'missing' hours.
Secondly, even if there are actual lost sales, that is NOT the same as lost money. Lost sales are lost margin. If Amazon is selling a shoe for $100, they have to BUY it somewhere, say for $70. So if John didn't buy that shoe, Amazon didn't have to buy that shoe either. Therefore Amazon wasn't out $100, they were out only their margin, or $30. In the interest of fluffing numbers and getting the result quickly (and because the actual result would take hard work as well as involving some proprietary info like margins that you might never get), I've almost never seen "loss" statistics like this reported as anything but gross numbers. Depending on the margins of sale involved, this can easily be 10x what the actual lost margin was. (Plus, the point of course is to show how impactful something is in the first place....)
Combining the two? I'd guess that the actual financial impact is barely 1% of the number stated.
Re: (Score:1)
I'm pretty sure we could find somewhere in AWS data how much they are making a month with S3. They lost 10% of it just due to the SLA. That`s not counting all the engineers which had to fix things, improve system, move stuff around, prevent further failures, etc, at quite a lot of companies.
Re: (Score:2)
Ergo we "lost" $3 million
The problem isn't defining what was lost, the problem is that the dollar figure itself isn't defined.
If this were they case they most definitely "lost" $3 million in revenue / turnover. The actual profit value will differ, in some cases profit may even have gone up if a company typically makes a loss selling some of their products, but since accounting is often done in revenue, volumes of sales are often directly proportional to revenue, and it is the revenue stream which is interrupted it most definitely m
Terrible development practices cost $150m. (Score:2)
S&P Companies Lost but Small Businesses Gained (Score:1)
Re: (Score:1)
That's very questionable (Score:2)
Serious companies that host anything have Service License Agreements that can cover response times, escalations, downtime, systems affected, resolution times etc etc.
Even if this is not strictly covered in a contractual, legally binding SLA Amazon would do well to pony up something for the big boys.
Now, if you jump through all the SLAs, backups, insurance and DR/backups then you may find the impact was minuscule.
Of course if you host with AWS and wee affected you cry wolf, claim damages are in thousan
Next time they'll use their own data centers (Score:2)
you'd be amazed (Score:2)
You'd be amazed at how much money a rainstorm costs the country. Or a heatwave. Or a cold virus.
Gee, what a shame (Score:1)
Re: (Score:2)
Here, let me pull out the world's most violin for you, and use my thumb and index player to play it.
The usual response is "You accidentally the verb," but in this case "You accidentally forgot a very word" .
Re: (Score:1)
Perfect vs the alternative. (Score:2)
Do not compare it perfection, compare it to the alternative. Without such a cloud based computational capacity, each company would size their IT infrastructure for peak load. Since peak load of all companies do not happen at the same time, when one company is running at full load lots of other companies are running at a fraction of their peak capacity. The clou
Cloud != Magic (Score:4, Insightful)
I'm working on a huge migration of an on-site system to Azure right now, and it's hard to convince people paying the bills of what's actually needed to guarantee high availability. The S3 outage is a perfect example of this...we have the same problem with Azure Storage Accounts being treated as a magic box by the developers. For example, Azure storage has locally redundant and geo-redundant levels. People hear "redundant" and assume that there will never be any issues accessing things you store in a storage account. If there was a disaster of some kind, it only protects the _data_ against the failure of a rack (locally redundant) or a datacenter (geo-redundant.) If a problem like what happened with S3 occurred, and access to the actual storage through the software-defined magic is disrupted, you're still going to have a bad day. You just (probably) won't lose the data. Obviously the cloud providers do everything they can to make sure things stay running, but not adding in some sort of failover above the cloud service level is just asking for trouble if you're doing anything critical.
I'm a "classic IT" guy who totally has an open mind about the cloud, but I do think there's lots of hype and misinformation. Designing for high availability is at least as hard as it was. Doing this in the cloud is quite expensive...maybe not as expensive as rolling your own infrastructure, but a wake-up call when the CIO gets the bill. I just wish the hype bubble would die down so people could have rational conversations about public cloud. It's just like on-premises stuff - don't pay for HA and risk downtime, or pay up and get the SLAs you pay for. I just hate that people are going around saying the cloud is bulletproof and immune to failures....it's technology at the end of the day and people make mistakes (especially overworked AWS engineers working 100 hour weeks or Microsoft guys who forgot to renew certificates, etc.)
Re: (Score:2)
Agreed. The biggest question I would pose here is that why did not all these companies take cloud based services not being available to allow continuation of business into account? This is a basic HA design paradigm in order to not be affected by service interruptions. Anytime an external system call is performed, the assumption should be that it is not available and provide an alternative mechanism to handle outages and not affect business operations. Unless, of course, the business has no issues with
Re: (Score:2)
The biggest question I would pose here is that why did not all these companies take cloud based services not being available to allow continuation of business into account?
Especially considering this is not the first time AWS has gone down. It does this every year or so.
Re: (Score:2)
Designing for high availability is at least as hard as it was. Doing this in the cloud is quite expensive...maybe not as expensive as rolling your own infrastructure, but a wake-up call when the CIO gets the bill.
Minus the part about the CIO being surprised at the bill (only a poor CIO wouldn't forecast the costs of running a product in any environment, including a public cloud), you hit the nail on the head as to why public cloud is so popular. It's not magic, but it IS cheaper for small/medium sized companies to take advantage of highly available services that they wouldn't otherwise be able to afford in their own DC's.
That said, you absolutely have to build your application on public cloud with failure in mind.
Only 150 Million? (Score:4, Interesting)
That's like spitting in in the ocean for a day of profit in the S&P 500.
Where this news may not be fake, it sure illustrates how absurd this kind of reporting sometimes is. $150 Million may be a lot of money to you or me, but it's about the same as you cleaning out your couch cushions the day you got paid and the income tax refund hit for the S&P 500. This isn't even a ripple in the profit pool. Yet here we are regaled by "woe is us in the S&P 500" reports..
Re: (Score:2)
That's like spitting in in the ocean for a day of profit in the S&P 500.
Where this news may not be fake, it sure illustrates how absurd this kind of reporting sometimes is. $150 Million may be a lot of money to you or me, but it's about the same as you cleaning out your couch cushions the day you got paid and the income tax refund hit for the S&P 500. This isn't even a ripple in the profit pool. Yet here we are regaled by "woe is us in the S&P 500" reports..
Considering the Fortune 500 companies earned over $1.35 billion per hour in 2012, a loss of $150 million in 5 hours is 2% of their sales over the outage, 0.5% of their sales that day, and less than .02% of their sales that month.
This is similar to if I lost $2 this month because of a single AWS outage.
S&P 500 Companies probably still came out on t (Score:2)
Its nothing. (Score:1)
$150M sounds like a big headline... compared to the S&P 500 as a whole, its nothing. Apple alone did $216B in revenue their last fiscal year, let alone the other 499 companies.
Region Failover, Guys (Score:2)
I'm mystified as to why these companies running mission-critical apps with $$$ on the line aren't using multi-region redundancy or at least failover. Imagine if some terrorist dug up the fiber lines leading to the Ashburn primary datacenter, causing US-EAST-1 to be offline for days.
This is why you spread your resources around and have redundancies across different geographical regions. That way, the worst that could happen is users might experience a momentary lag, or maybe a couple TCP connections might ge
such utter BS (Score:2)
Re: (Score:2)
Setting aside impulse buys you're also ignoring volume limitations. Saying someone will come bank and buy it later just means that the person after will now be delayed slightly more. Time is not always recoverable, especially in a competitive market place.