Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Businesses IT Technology

Amazon Outage Cost S&P 500 Companies $150M (axios.com) 113

From a report on Axios: Cyence, an economic modeling platform, shared some data with Axios that show the ramifications: Losses of $150 million for S&P 500 companies. Losses of $160 million for U.S. financial services companies using the infrastructure.
This discussion has been archived. No new comments can be posted.

Amazon Outage Cost S&P 500 Companies $150M

Comments Filter:
  • If you took responsibility for your own hardware resources, this wouldn't have been an issue for you.
    • Re: (Score:3, Insightful)

      by fabriciom ( 916565 )
      If you ever get to management and you have to answer for errors of your subordinates your opinion will change.
      • If you ever get to management and you have to answer for errors of your subordinates your opinion will change.

        And by outsourcing your critical IT resources and eliminating subordinate positions, that will make it that much more obvious where the blame should go.

        • That's how business works today. Cheaper, faster, and external. If you don't like it you are welcome to play the startup game.
          • That's how business works today. Cheaper, faster, and external.

            If you don't like it you are welcome to play the startup game.

            If, of course, you can raise the capital.

            Enough capital, in fact, to overcome the fact that established players will probably be paying much less for virtually everything than you will.

    • by James Carnley ( 789899 ) on Friday March 03, 2017 @11:41AM (#53969645) Homepage

      Yeah because self hosted hardware never goes down. Totally rock solid. I don't know why everyone doesn't host their own stuff so that nothing can go wrong.

      • I understand what you mean.

        However... The more you have your stuff together the easier it is to reach absurdly high levels of availability at affordable costs. Automatic host fail over, automatic site fail over, etc...

        Then again not many have.

        • However... The more you have your stuff together the easier it is to reach absurdly high levels of availability at affordable costs. Automatic host fail over, automatic site fail over, etc...

          How nice. And when the employee that put all that together leaves to company for greener pastures or to pursue his dreams and when you have to replace him on short notice, that setup falls apart. Likewise, when you suddenly need to quadruple your capacity because of some business decision, you lack the staff and resourc

          • However... The more you have your stuff together the easier it is to reach absurdly high levels of availability at affordable costs. Automatic host fail over, automatic site fail over, etc...

            How nice. And when the employee that put all that together leaves to company for greener pastures or to pursue his dreams and when you have to replace him on short notice, that setup falls apart. Likewise, when you suddenly need to quadruple your capacity because of some business decision, you lack the staff and resources to do so quickly.

            The nice thing about Amazon is that it is predictable, low (not zero) risk, and scalable.

            I said "The more you have your stuff together". That means you have removed the bus effect as a factor. Not having your stuff together means the market forces will efficiently deal with you sooner or later.

            • I said "The more you have your stuff together".

              If you run your own IT shop, you necessarily have a much smaller pool of IT staff than Amazon. That means that your risk of losing an employee who is key for keeping your systems running is necessarily much larger than for Amazon. If you don't understand that, you certainly "don't have your stuff together".

              • I said "The more you have your stuff together".

                If you run your own IT shop, you necessarily have a much smaller pool of IT staff than Amazon. That means that your risk of losing an employee who is key for keeping your systems running is necessarily much larger than for Amazon. If you don't understand that, you certainly "don't have your stuff together".

                Either you don't get it or you don't want to get it.

                Well setup thought of systems require little staff. But you must be prepared to go the whole 9 yards during development phase. Resulting systems behave reasonably and predictively with respect to resources (CPU, memory, storage, networking bandwidth, etc...) The function of required staff vs. workload should be asymptotic and mustn't be linear and certainly not exponential. The task of the system administrator must be extremely boring (backups, restorin

                • Either you don't get it or you don't want to get it.

                  No, you don't get it. You say that if a company has great people, it can achieve Amazon-like stability. It can. But the problem is that great people are hard to find, so when they retire or leave, you are left with a problem on your hand.

                  Solid businesses don't rely on technical or managerial superstars, they have business processes that function reliably with mediocre technical staff and managers.

        • by Anonymous Coward

          Right up until someone turns it all off like what happened to Amazon...

        • However... The more you have your stuff together

          Nope. The more you are an expert in managing your stuff the easier it is to reach absurdly high levels of availability. The vast majority of Amazon Cloud users would by themselves be very unlikely to be able to reach the uptime and availability much less the scalable bandwidth available through that kind of hosting.

      • by TWX ( 665546 )
        Our locally-hosted AS/400 has not had an unscheduled outage in something like fifteen years, and that includes at least one full hardware migration. Mind you, there's only one local admin left that knows how to read the chicken bones and tea leaves to run the thing, but it's not exactly impossible to have excellent availability when the right platforms are chosen and are maintained.

        It's also perfectly possible for a large enough organization to run separate datacenters at facilities in different geograp
        • Mind you, there's only one local admin left that knows how to read the chicken bones and tea leaves to run the thing...

          Which is because no one in a position of power chose to invest in training more people to replace him. Please don't tell me it's impossible; with enough money and commitment, most things are possible.

          What isn't possible is to get a result when you are too stingy to provide the necessary means.

        • by Jaime2 ( 824950 )

          At my last job, our AS/400 had to have all of the applications shut down to do a nightly backup. The backup took nearly every second that the business was closed. Scheduled maintenance had to be done on holidays.

          One time we had to move its network connection to another switch port. The thing didn't work again until we hard rebooted it.

          The software on it could only be accessed from the network by running CL scripts - so there was no such thing as transactional integrity. The programmers used a five digit bat

        • Our locally-hosted AS/400 has not had an unscheduled outage in something like fifteen years, and that includes at least one full hardware migration

          It's hard to find Admins who know how to have high availability in their own datacenter. AWS wins because of the lack of expertise in the world.

        • Hell, it's even possible to tunnel L2 so that the equipment at the different facilities doesn't even know that it's not all at one big happy site, should that sort of thing be necessary.

          I guess you never heard that those "faster than light" neutrinos were not a thing.

          Oh, hell, turns out it's actually impossible to tunnel L2 so that the equipment at the different facilities doesn't become partitioned by a Giant Lag Troll.

          Not that any sane greybeard of yore would couple the network stack directly into the wal

    • by Jaime2 ( 824950 )

      If you outsource, you can blame the service provider. If you do it internally, you take the blame yourself. No wonder the cloud is so popular.

      • by TWX ( 665546 )
        Because blame-storming works when your entire company's service is entirely offline and now your customers leave you.

        We felt the effect of the Amazon issue through a service that we've contracted-for. That service provider gets no special consideration in our judgement of them just because the entity they subbed-out to went down.
        • by Jaime2 ( 824950 )
          It doesn't work out for the company, but IT managers do use this excuse regularly. I'm not suggesting that it's a good thing, just that management seems to be more about avoiding blame than providing solutions.
          • ...[M]anagement seems to be more about avoiding blame than providing solutions.

            Which is a far bigger problem than some AWS systems going down for a few hours. If management really is more about avoiding blame than making the organization successful, success will prove very elusive indeed.

            And of course if management is working in the wrong way... that too is a management problem.

            It's not as easy as people think.

            • by Jaime2 ( 824950 )

              At my previous job at a Fortune 100 company...

              Me: Hey boss, we spend half our time cleaning up the mess that is caused by this one bug. I suggest we put a little time into fixing the bug. Boss: Fixing the bug is build work and that requires a business unit to provide a request and the capital to do the work. Cleaning up the mess is maintenance work and the whole company pays for that. So, until some other department pays us to fix this problem, we must continue to put our time into maintenance work. Me: But

        • and now your customers leave you.

          Nope. The customers were down too and never noticed. Granma was wrong - put all your eggs in the biggest damn basket you can find. You may lose all your eggs when the basket goes nuclear, but Joe public will have bigger things than eggs on his mind!

          (Nuclear baskets are really scary - take it from me!)

    • Owning the backend isn't a cure all. When I was an intern at Fujitsu in the late 1990's, I discovered the crash bug on the test server and could reproduce it 100%. My supervisor couldn't reproduce the bug even though we took turns at the keyboard. He approved the patch for production. The servers crashed 24 hours later. The engineers determined that a deep fix was required, forcing the server offline for three days and costing $250K in lost revenues. I wasn't offered a job when my internship expired. One-th
    • If you took responsibility for your own hardware resources, this wouldn't have been an issue for you.

      True, but anything from the extra staff to your data center flooding would. And if you total all of that up across S&P 500 companies, you likely end up with bigger total losses.

      In different words, your advice is penny wise and pound foolish.

    • by bws111 ( 1216812 ) on Friday March 03, 2017 @12:31PM (#53969999)

      Yeah. Do you also

      * Run your own communications system with 2-way radios, or do you trust telcos for that?
      * Run your own wires to every customer, or do you trust ISPs for that?
      * Run your own fleet of trucks to deliver product, or do you trust shipping cos for that?
      * Have all you customers pay you directly in cash that you keep in your own vault, or do you trust credit card companies and banks for that?
      * Perform all your own accounting, or do you trust outside accountants for that?

      The list goes on and on. Every one of those is at least as important as servers (and in some cases they are far more important)

      • Finally someone on Slashdot that gets it. In many cases people don't outsource because they are cheap, they outsource because other people are better at it than they ever were.

        Now if only non-important small companies like airlines would upgrade to Amazon's cloud, then we can stop running weekly stories about companies grinding to a halt* because their inhouse services are falling over.

        *Okay it's not that simple but I hope I'm getting the point across. Most people using Amazon's servers would not have the u

    • 150M, thats pocket change.
    • If you took responsibility for your own hardware resources, this wouldn't have been an issue for you.

      Exactly! I haven't made a mistake in 20 years, maybe 30, and have an uptime so long that I use a pitch drop experiment to measure it. Both ways.

  • I think the title says it all. No need to add a one-line summary with the link.
    • Summary is actually the entire article. I'm absolutely blown away. I guess I shouldn't be, but holy shit. How did an article with no content get linked to?

      • by msauve ( 701917 )
        Yeah, nothing about how they got to that number? Did they consider that while there was certainly business which didn't happen during the outage, it may have simply been time-shifted to a few hours later?

        This appears to be nothing but opportunistic marketing BS from Cyence.
        • Yeah, nothing about how they got to that number? Did they consider that while there was certainly business which didn't happen during the outage, it may have simply been time-shifted to a few hours later?

          This appears to be nothing but opportunistic marketing BS from Cyence.

          Indeed, this was my immediate thought upon seeing the headline. A temporary loss of $150million that got rectified an hour later when the systems came back online isn't a big deal.

    • Posting a comment that says no more than the subject would be silly. No need for a one-line summary.

  • by Anonymous Coward

    If Amazon can be considered negligent by failing to put a competent person in charge of whatever operation it was that caused the outage, companies should be able to recover lost revenue and profit from Amazon.

    Contractual indemnity does not shield against negligence.

    • by PIBM ( 588930 )

      As S3 was down for more than 44 minutes but less than 7h18 (about 5 hours total), a monthly rebate of 10% is supposed to be applied to everyone`s S3 related fees for February. That engineer which pressed the DELETE button has caused quite a bill..

  • I think that money was just never made. It didn't cost them anything, other than not meeting earnings expectations.

    It only cost them money if they spent something.

    • That's not true! I don't even use Amazon's systems but I suffered a loss of $150K!

      Amazon, please send 117 Bitcoins to 1LHuLKyHDndUdjgKUsmfAG8tDnXZ5fTuUA to compensate for my imagined losses. Thank you.

    • Re:Really? (Score:5, Insightful)

      by ThomasBHardy ( 827616 ) on Friday March 03, 2017 @11:44AM (#53969673)

      I think it's even more overstated than that.

      Without having any indicator other than that link to an article a couple of lines long, we have no info.

      Is the $150 million value the "normal throughput of transactions during the regular operation of that same time frame that the outage occurred? Because if so, I highly doubt they lost that much. I tried to place an order somewhere during that outage. There was an error. So i tried again later and placed my order. The company lost nothing in regards to my order. I'm sure mine is not the only transaction that was not re-tried later on.

      Bold statements about what an outage costs are not helpful unless the methodology for calculating that cost is both divulged and reasonably calculated.

    • by caseih ( 160668 )

      I tend to agree with you. Particularly when it comes to folks like the RIAA and MPAA talking about "losses" due to copyright infringement. That's clearly a case of theoretical profits that they didn't take. Would be great if I could write off my theoretical profits as losses on my taxes!

      But in this case they may well have spent money. Expenses and costs tend to be there regardless of whether you're making money. So it's likely that these companies had pretty high money outflow which was not making a re

  • Meh (Score:5, Insightful)

    by argStyopa ( 232550 ) on Friday March 03, 2017 @11:43AM (#53969665) Journal

    We hear this sort of statistic a lot but I have to ask, did they REALLY?

    Anyone with experience with this sort of thing understand how fluffy these numbers are, based on statistics, some WAG, etc.

    For example:
    We processed $1 million orders per hour.
    We were down for 3 hours.
    Ergo we "lost" $3 million.

    In fact, no such thing is true. At least, not like someone poured $3 million in cash into a furnace and actually LOST the money.

    First, there's the missed opportunity sales. What you're talking about in fact is purchases that didn't take place because the seller wasn't available. This has everything to do with flexibility of supply and time-sensitivity of delivery. If in fact John Smith wanted to order shoes from Amazon, and Amazon was down, so he went to company XYZ and bought those shoes or decided not to buy at all, then in fact is is reasonably a "lost sale" for Amazon. HOWEVER, if John couldn't reach XYZ (not unlikely with the broad infrastructure hit that the outage caused), or they didn't have his brand, or he just said "ok, I'll just buy them tomorrow" it WASN'T a lost sale at all. And it's HIGHLY unlikely that the consultants throwing together these figures rationalized any later excess demand back into the 'missing' hours.

    Secondly, even if there are actual lost sales, that is NOT the same as lost money. Lost sales are lost margin. If Amazon is selling a shoe for $100, they have to BUY it somewhere, say for $70. So if John didn't buy that shoe, Amazon didn't have to buy that shoe either. Therefore Amazon wasn't out $100, they were out only their margin, or $30. In the interest of fluffing numbers and getting the result quickly (and because the actual result would take hard work as well as involving some proprietary info like margins that you might never get), I've almost never seen "loss" statistics like this reported as anything but gross numbers. Depending on the margins of sale involved, this can easily be 10x what the actual lost margin was. (Plus, the point of course is to show how impactful something is in the first place....)

    Combining the two? I'd guess that the actual financial impact is barely 1% of the number stated.

    • by PIBM ( 588930 )

      I'm pretty sure we could find somewhere in AWS data how much they are making a month with S3. They lost 10% of it just due to the SLA. That`s not counting all the engineers which had to fix things, improve system, move stuff around, prevent further failures, etc, at quite a lot of companies.

    • Ergo we "lost" $3 million

      The problem isn't defining what was lost, the problem is that the dollar figure itself isn't defined.
      If this were they case they most definitely "lost" $3 million in revenue / turnover. The actual profit value will differ, in some cases profit may even have gone up if a company typically makes a loss selling some of their products, but since accounting is often done in revenue, volumes of sales are often directly proportional to revenue, and it is the revenue stream which is interrupted it most definitely m

  • If your systems are *that* important, you should mirror them across multiple geographic locations. I've seen the same story in multiple forms several times now. The cloud is not a magical place in the ether. There is a computer somewhere with your code on it. That computer can catch fire, lose power, be destroyed in a hurricane, etc. This is what happens when you don't account for that reality.
  • Instead of buying from Amazon, all those customers bought from the small business website selling the same items. That $150 million didn't just disappear.

  • Serious companies that host anything have Service License Agreements that can cover response times, escalations, downtime, systems affected, resolution times etc etc.

    Even if this is not strictly covered in a contractual, legally binding SLA Amazon would do well to pony up something for the big boys.

    Now, if you jump through all the SLAs, backups, insurance and DR/backups then you may find the impact was minuscule.

    Of course if you host with AWS and wee affected you cry wolf, claim damages are in thousan
  • And not someone else's. The so called "could". It vanishes as a cloud of smoke!
  • You'd be amazed at how much money a rainstorm costs the country. Or a heatwave. Or a cold virus.

  • Here, let me pull out the world's most violin for you, and use my thumb and index player to play it.
    • Here, let me pull out the world's most violin for you, and use my thumb and index player to play it.

      The usual response is "You accidentally the verb," but in this case "You accidentally forgot a very word" .

      • I stand accused. I meant to say the world's most tiniest violin. This is what happens when you reply to people for too many hours.
  • So they "lost" 150 million dollars in a four hour outage? Are they then saving or profiting 150 million dollars every four hours using Amazon cloud?

    Do not compare it perfection, compare it to the alternative. Without such a cloud based computational capacity, each company would size their IT infrastructure for peak load. Since peak load of all companies do not happen at the same time, when one company is running at full load lots of other companies are running at a fraction of their peak capacity. The clou

  • Cloud != Magic (Score:4, Insightful)

    by ErichTheRed ( 39327 ) on Friday March 03, 2017 @12:30PM (#53969995)

    I'm working on a huge migration of an on-site system to Azure right now, and it's hard to convince people paying the bills of what's actually needed to guarantee high availability. The S3 outage is a perfect example of this...we have the same problem with Azure Storage Accounts being treated as a magic box by the developers. For example, Azure storage has locally redundant and geo-redundant levels. People hear "redundant" and assume that there will never be any issues accessing things you store in a storage account. If there was a disaster of some kind, it only protects the _data_ against the failure of a rack (locally redundant) or a datacenter (geo-redundant.) If a problem like what happened with S3 occurred, and access to the actual storage through the software-defined magic is disrupted, you're still going to have a bad day. You just (probably) won't lose the data. Obviously the cloud providers do everything they can to make sure things stay running, but not adding in some sort of failover above the cloud service level is just asking for trouble if you're doing anything critical.

    I'm a "classic IT" guy who totally has an open mind about the cloud, but I do think there's lots of hype and misinformation. Designing for high availability is at least as hard as it was. Doing this in the cloud is quite expensive...maybe not as expensive as rolling your own infrastructure, but a wake-up call when the CIO gets the bill. I just wish the hype bubble would die down so people could have rational conversations about public cloud. It's just like on-premises stuff - don't pay for HA and risk downtime, or pay up and get the SLAs you pay for. I just hate that people are going around saying the cloud is bulletproof and immune to failures....it's technology at the end of the day and people make mistakes (especially overworked AWS engineers working 100 hour weeks or Microsoft guys who forgot to renew certificates, etc.)

    • Agreed. The biggest question I would pose here is that why did not all these companies take cloud based services not being available to allow continuation of business into account? This is a basic HA design paradigm in order to not be affected by service interruptions. Anytime an external system call is performed, the assumption should be that it is not available and provide an alternative mechanism to handle outages and not affect business operations. Unless, of course, the business has no issues with

      • The biggest question I would pose here is that why did not all these companies take cloud based services not being available to allow continuation of business into account?

        Especially considering this is not the first time AWS has gone down. It does this every year or so.

    • by XXeR ( 447912 )

      Designing for high availability is at least as hard as it was. Doing this in the cloud is quite expensive...maybe not as expensive as rolling your own infrastructure, but a wake-up call when the CIO gets the bill.

      Minus the part about the CIO being surprised at the bill (only a poor CIO wouldn't forecast the costs of running a product in any environment, including a public cloud), you hit the nail on the head as to why public cloud is so popular. It's not magic, but it IS cheaper for small/medium sized companies to take advantage of highly available services that they wouldn't otherwise be able to afford in their own DC's.

      That said, you absolutely have to build your application on public cloud with failure in mind.

  • Only 150 Million? (Score:4, Interesting)

    by bobbied ( 2522392 ) on Friday March 03, 2017 @12:48PM (#53970153)

    That's like spitting in in the ocean for a day of profit in the S&P 500.

    Where this news may not be fake, it sure illustrates how absurd this kind of reporting sometimes is. $150 Million may be a lot of money to you or me, but it's about the same as you cleaning out your couch cushions the day you got paid and the income tax refund hit for the S&P 500. This isn't even a ripple in the profit pool. Yet here we are regaled by "woe is us in the S&P 500" reports..

    • by ranton ( 36917 )

      That's like spitting in in the ocean for a day of profit in the S&P 500.

      Where this news may not be fake, it sure illustrates how absurd this kind of reporting sometimes is. $150 Million may be a lot of money to you or me, but it's about the same as you cleaning out your couch cushions the day you got paid and the income tax refund hit for the S&P 500. This isn't even a ripple in the profit pool. Yet here we are regaled by "woe is us in the S&P 500" reports..

      Considering the Fortune 500 companies earned over $1.35 billion per hour in 2012, a loss of $150 million in 5 hours is 2% of their sales over the outage, 0.5% of their sales that day, and less than .02% of their sales that month.

      This is similar to if I lost $2 this month because of a single AWS outage.

  • Sure they lost a huge chunk of money. But if they had housed their own data, how much would that have cost them up to this point? I wonder if it would have cost more than 150 million. But to address the issue, they should get some redundancy. Mirror across several clouds if need be. It makes me wonder if mirroring would still give them an economic edge vs hosting their own hardware and all the support that requires in additional to the hardware costs.
  • $150M sounds like a big headline... compared to the S&P 500 as a whole, its nothing. Apple alone did $216B in revenue their last fiscal year, let alone the other 499 companies.

  • I'm mystified as to why these companies running mission-critical apps with $$$ on the line aren't using multi-region redundancy or at least failover. Imagine if some terrorist dug up the fiber lines leading to the Ashburn primary datacenter, causing US-EAST-1 to be offline for days.

    This is why you spread your resources around and have redundancies across different geographical regions. That way, the worst that could happen is users might experience a momentary lag, or maybe a couple TCP connections might ge

  • Or, realistically, the customers saw the site was down and just came back later. So basically they lost nothing. You know, back in reality that's what happened.
    • Setting aside impulse buys you're also ignoring volume limitations. Saying someone will come bank and buy it later just means that the person after will now be delayed slightly more. Time is not always recoverable, especially in a competitive market place.

It is easier to write an incorrect program than understand a correct one.

Working...