Forgot your password?
typodupeerror
Cloud The Internet

EC2 Outage Shows How Much the Net Relies On Amazon 147

Posted by Soulskill
from the too-big-to-fail dept.
An anonymous reader writes "Much has been written about the recent EC2/EBS outage, but Keir Thomas at PC World has a different take: it's shown how much cutting-edge Internet infrastructure relies on Amazon, and we should be grateful. Quoting: 'Amazon is a personification of the spirit of the Internet, which is one of true democracy, access to the means of distribution, and rapid evolution.'" An article at O'Reilly comes to a similarly positive conclusion from a different angle.
This discussion has been archived. No new comments can be posted.

EC2 Outage Shows How Much the Net Relies On Amazon

Comments Filter:
  • Multiple Locations (Score:3, Informative)

    by WrongSizeGlass (838941) on Saturday April 23, 2011 @02:37PM (#35916132)
    Amazon has an option to have another Amazon location serve as the failover for your services. Yes, it costs more, but it does exactly what it's supposed to when this type of thing happens. If your backup/disaster recover plan requires as close to 100% uptime as possible you'll want to pay the extra for this type of protection.
    • by pavon (30274) on Saturday April 23, 2011 @02:46PM (#35916188)

      A large number of people that are experiencing this outage, did pay for multiple availability zones, and it didn't help them [networkworld.com].

      • by el_tedward (1612093) on Saturday April 23, 2011 @03:06PM (#35916340)

        I guess what we should learn from this is to put your failover in separate regions, not separate availability zones?

        • It looks like Amazon already defines the "availability zones" as areas that should be redundant (and thus not prone to being affected by other zones going down). Do they even have the capability to spread someone out across different regions?
          • by hawguy (1600213)

            Do they even have the capability to spread someone out across different regions?

            Yes you have full control over what region your instance runs in - some regions cost more than others, the East region is cheaper than the West region.

        • by rsborg (111459)

          I guess what we should learn from this is to put your failover in separate regions, not separate availability zones?

          Apparently, data transfer between AZs is cheap or free, while transferring data between regions is effectively transmitting them over the open internet, and counts towards your bandwidth allotment/cost, so it's sometimes prohibitively expensive to failover across regions... it would almost be less expensive to just failover to another hosting provider (which may be even more stable than sticking with Amazon).

          This outage was a big black-eye for Amazon, as their recommended way to failover was used, and promp

          • by jvkjvk (102057)

            Maybe those very smart people should have thought what the difference between an "availability zone" and a region was besides cost, and what that might connote.

            It seems like some other very smart people got that right - e.g. Netflix.

            While having this outage is a black eye for Amazon, the service tiers seemed to have worked - people who did not pay for regional redundancy did not get it, while people who did were fine.

            Now, all of those smart people can ask themselves if ponying up the cash is worth it or if

        • by CAIMLAS (41445)

          The region isn't going to matter if their internal infrastructure manages the clusters (and that's what they are - cloud 'clusters' ) are all 'centrally managed', and that management structure is what failed.

        • by Khyber (864651)

          "I guess what we should learn from this is to put your failover in separate regions, not separate availability zones?"

          No, you keep a backup ON SITE.

          Christ, even my website has an emergency backup server. While Amazon was fucking everyone else, I was still happily online doing business as usual.

          I have said repeatedly that cloud computing wasn't going to be worth a shit. Go ask Reddit how its working out for them right now. Go ask Microsoft about their little fuckup.

      • by WrongSizeGlass (838941) on Saturday April 23, 2011 @03:09PM (#35916366)
        From the NYT article: [nytimes.com]

        Big companies, that have decided to put crucial operations on Amazon computers are apt to pay up for the equivalent of computing insurance, analysts say. Netflix, the movie rental site, has become a large customer of the Amazon cloud. Most of its Web technology — customer movie queues, search tools and the like — runs in Amazon data centers.

        Netflix said it had sailed through the last couple of days unscathed. “That’s because Netflix has taken full advantage of Amazon Web Services’ redundant cloud architecture,” which insures against technical malfunctions in any one location, said Steve Swasey, a Netflix spokesman.

        Sounds like it worked for some.

        • by lonecrow (931585)
          netflix.ca was down. So they were not unscathed just because no Americans were effected.
      • by codepunk (167897)

        It worked just fine I was in the effected zone and just failed over to the west coast region. I actually could have stayed on east coast as our infrastructure does not have single points of failure.

      • by Guspaz (556486) on Saturday April 23, 2011 @03:15PM (#35916426)

        Paying for multiple availability zones is not the same as paying for multiple locations. There are multiple availability zones in a single datacenter. Netflix got it right, they spread their infrastructure over multiple physical locations, and didn't suffer any downtime despite losing a significant chunk of their infrastructure; it was business as usual.

        Like anything else, cloud computing still requires you to decide how much redundancy you're willing to pay for. If uptime is that important to you, spreading your infrastructure out over multiple datacenters is a no-brainer.

  • by Hartree (191324) on Saturday April 23, 2011 @02:40PM (#35916148)

    This article seems to be an apology for Amazon.

    Basicly it says "We went down, and took down lots of important stuff. That shows just how important we are and that lots of people use us. Thus, our cloud is a good thing."

    The logic of that doesn't quite work.

    I agree that it's a useful tool, but there are a lot of things that don't make sense to put in the cloud.

    • by WrongSizeGlass (838941) on Saturday April 23, 2011 @02:45PM (#35916180)

      I agree that it's a useful tool, but there are a lot of things that don't make sense to put in the cloud.

      I always feel better when anything that is mission critical is in-house. Cloud based (and regular internet based) services can become inaccessible for your business if you simply lose your internet connection - it doesn't require all of Amazon to bite the dust.

      • by hawguy (1600213)

        I always feel better when anything that is mission critical is in-house. Cloud based (and regular internet based) services can become inaccessible for your business if you simply lose your internet connection - it doesn't require all of Amazon to bite the dust.

        But if having your application available to the outside world is mission-critical to the outside world, you're almost always better off colocating it with providers in multiple physical locations.

        Even for internal apps that are necessary for your business, you may be better off outsourcing, since if your building catches on fire, you can send employees home to let them continue working. Few companies have the resources to build a truly redundant hosting infrastructure across multiple regions.

        • by Steeltoe (98226)

          Nice, so colocating means you give your code and data to thieves and governments. Just great doing business with you!

      • by pjbass (144318)

        The issue today though isn't in-house vs. colocated, it's cost. Most of these companies don't have the cash to build proper infrastructure to house their services locally. The cloud services from various companies, like Amazon, take care of the physical maintenance and cooling and power, etc.

        Even if your local datacenter housed mission-critical data, I'm sure it's possible to come up with 100 scenarios where you could lose all connectivity to your locally-housed infrastructure (power company accidentally d

    • by mjwalshe (1680392)
      So what they are saying in the style of Jeremy Clarkson " Where rubbish give us money "
    • by segedunum (883035)
      Hmmmm, yer. The whole apology is just not working for me.

      Yes, you could architect much easier with cloud platforms to failover in different regions of the world, and Reese is simply plugging his own company's stuff on that one. However, that just makes things more expensive and negates the cost effectiveness of using cloud services in terms of more servers and increased complexity. Will most businesses really need to do that given that they could afford to put their stuff in a single data centre somewher
      • by Rakishi (759894)

        You're stuck in a non-vm mentality it seems so I'm not sure why you're talking about things you don't understand.

        However, that just makes things more expensive and negates the cost effectiveness of using cloud services in terms of more servers and increased complexity.

        How so? Why do you need that many more servers, you're either splitting traffic (so roughly the same number of servers) or simply having enough servers to pick up backups. Now data storage of duplicate backups may add some costs but that's neither servers nor complexity.

        And as I said before and you seem to not understand, this is a cloud. If you main servers go down you don't need to have an iden

        • by dave562 (969951)

          My server was down at amazon for maybe 12 hours, that's when I noticed and simply reloaded it from backup in a different working availability zone. Took a few minutes. Had I cared enough to keep backups in a different region then I could have simply reloaded it instantly over there.

          How did you architect the data storage for your applications? Is the data kept with the servers? How much data are you working with?

          It seems to me that Amazon is decent for the web tier, or any application with a relatively sma

          • by segedunum (883035)

            How did you architect the data storage for your applications? Is the data kept with the servers? How much data are you working with?

            Because he hasn't actually done what he says he has. Note that he says he got this running by moving to another availability zone when it was multiple availability zones in one region that were affected.

        • by segedunum (883035)

          You're stuck in a non-vm mentality it seems so I'm not sure why you're talking about things you don't understand.

          I do understand it sweetheart because I do it for a fucking living. Give me a break with these Slashdot weenies.

          How so? Why do you need that many more servers, you're either splitting traffic (so roughly the same number of servers) or simply having enough servers to pick up backups. Now data storage of duplicate backups may add some costs but that's neither servers nor complexity.

          Understanding how EC2 works might be a good start for you. You need to replicate your machine images across different regions as well as incur traffic costs for backup and/or replication depending on how much data you can afford to lose. It's excessive complexity.

          And as I said before and you seem to not understand, this is a cloud. If you main servers go down you don't need to have an identical copy of those servers running somewhere else 24/7. You simply create those copies on the fly. They cost you nothing until they're needed and when they are you're not paying for your main servers anyway.

          That's because you have no idea what you're talking about and simply don't know the practicalities of what's involved.

          You're assuming they can restore from backup, often companies don't want to lose the data since the last backup unless there is no choice. They also need to get a new server, image it, test it and actually put it in the data center. Sure they can automate it all but that, to quote your own words, that negates cost effectiveness.

          What does this mea

    • Basicly it says "We went down, and took down lots of important stuff. That shows just how important we are and that lots of people use us. Thus, our cloud is a good thing."

      That's exactly not what the article said. To summarise, it says, "any server setup has its flaws, but the advantages of Amazon to startups and the democratisation of the web is enough to be thankful for." And I'd agree.

  • I'll stick to my setup of a dedicated server and virtual private servers across the globe rather than putting all my eggs in one basket with Amazon and "cloud computing"! It may be a little bit more in terms of operating costs, but it has true failover in the event of an outage!
    • by hawguy (1600213)

      I'll stick to my setup of a dedicated server and virtual private servers across the globe rather than putting all my eggs in one basket with Amazon and "cloud computing"! It may be a little bit more in terms of operating costs, but it has true failover in the event of an outage!

      Then your app doesn't really need a dynamic cloud.

      Some companies have applications that run on a dozen servers during normal times, and need to scale to over a hundred servers during peak peak periods (i.e. a new product launch). With EC2, they can scale automatically and programatically and can spread the virtual servers across multiple regions for additional redundancy. All with a single API.

      • Re: (Score:1, Insightful)

        by Anonymous Coward

        With EC2, they can scale automatically and programatically and can spread the virtual servers across multiple regions for additional redundancy. All with a single API.

        That sure as fuck didn't seem to be the case these past few days.

        • by hawguy (1600213)

          With EC2, they can scale automatically and programatically and can spread the virtual servers across multiple regions for additional redundancy. All with a single API.

          That sure as fuck didn't seem to be the case these past few days.

          Sure it was - that's why Netflix had no problems, they had instances across more than one region.

    • by nurb432 (527695)

      I'll stick to my setup of a dedicated server and virtual private servers across the globe rather than putting all my eggs in one basket with Amazon and "cloud computing"!

      I hope that was sarcasm and you really are not that stupid.

  • by ninejaguar (517729) on Saturday April 23, 2011 @02:50PM (#35916212)

    Otherwise, Amazon will become too big to fail.

    = 9J =

  • Outages (Score:2, Insightful)

    by codepunk (167897)

    Many .com websites were unnecessarily down for hours since nobody had thought to plan for a outage. I am sure quite a few architecture meetings where held the following day addressing disaster recovery.

    • So, in other words, this is exactly what people who use cloud services for mission critical data needed. It's exceptionally hard to learn good lessons from success, but failures are almost guaranteed to teach something. In this case, the community will understand the potential cost of a four-to-six-nines system without a backup. There is always a finite chance of failure.

      Still, it was only down for , what - a day? Remember Loma Prieta? WTC collapses? Things happen, and when they do everybody is down for a w

    • Re:Outages (Score:4, Insightful)

      by pla (258480) on Saturday April 23, 2011 @03:44PM (#35916552) Journal
      Many .com websites were unnecessarily down for hours since nobody had thought to plan for a outage. I am sure quite a few architecture meetings where held the following day addressing disaster recovery.

      Y'know, call me crazy, but I didn't even notice the outage.

      I mean, yeah, I read about it on a number of sites (all still up and runing just fine), but honestly can't say I tried to visit even a single site actually unavailable because of the downtime.

      I dunno, perhaps this mostly affected ad hosts and I didn't notice because I already block them?
      • by Thing 1 (178996)

        Y'know, call me crazy, but I didn't even notice the outage.

        I noticed it: Pricewatch was down, and I wanted more memory in my laptop.

        • by Steeltoe (98226)

          Science can explain religion; not vice versa.

          Too bad science can't prove religion, and no, true science can never explain the unexplainable, that's just a dishonest fantasy of pseudoscientists trying to frame all of reality into their own narrow little worldview. It doesn't even matter if some superstitions are true or not or in what degree, because it's just a question of having the courage to keep an open mind about it, that's all. If you do, you could become the next Newton, Einstein, or something great,

          • by Thing 1 (178996)

            I'm not sure what you're trying to say about my signature. To clarify my position: science can explain what happens in one's brain when one experiences religion. Religious beliefs, on the other hand, tend not to lead to E=MC2 [1]; they're an approximation of logic just like emotions are. Emotions help us to survive, whereas religion helps others to control us (versus spirituality, which is something personal). I'm already something great, and tend not to follow the mainstream (although I still breathe o

  • SPF (Score:2, Insightful)

    by ktappe (747125)
    Wait....we should be glad we have a single point of failure on the internet because why?!?
    • by jd (1658)

      Apparently, because having just one party and no elections makes a democracy. And in later news, why Rupert Murdoch tapping everyone's phones is good for privacy.

    • How exactly is this a single point of failure? It's not like there are magically no other ways to put things on the Internet.

      And to address my sibling post - this is the purest, most direct form of democracy there is. You vote by using the service, or using some other service, or nothing at all, or many, many other configurations, some of which haven't even been invented yet.

      I'm actually rather amazed at the lack of critical thinking skills. I know it's a popular Slashdot meme to say things are going dow

      • I know it's a popular Slashdot meme to say things are going downhill, and I am well aware of the curious technophobic streak that runs through a lot of the people here... but to what end?

        This has been really bugging me. I started following Slashdot specifically to keep current on trends in IT. Again and again, I see not just recent innovations, but well-established trends derided as unworkable fad ideas. I was already used to the derision of cloud computing when I had an interview at a company that had been doing "software as a service" for over ten years.

        Why? My guess is that it has to do with the pattern I've seen of IT grognards who were hired to set up a new system, and remain in place,

  • by Anonymous Coward
    "which is one of true democracy" - They quickly forgot their take down of WikiLeaks. Part of democracy is free speech to remind the people of the governments failures. Putting all your eggs in one basket never ends well, we should be scared not grateful.
    • by jd (1658)

      One group taking down WikiLeaks doesn't really matter when it comes to democracy. Indeed, since choice is a part of democracy, one group is perfectly entitled to censor what they like, since one group is utterly insignificant. Indeed, that is how you identify democracies.

      The Internet is not democratic and hasn't been since deregulation. The Internet is a federation of dictatorships. You have no choices. If you live in an area where X runs the backbone, ALL ISPs without exception are mere window-dressing ove

      • by makomk (752139)

        one group is perfectly entitled to censor what they like

        They're entitled to, yes, but that doesn't mean that doing so is good for democracy and it certainly doesn't mean we shouldn't scoff and laugh when they're described as "true democracy". Someone needs to be willing to stand up and provide a platform for information that embarrasses the Government; if all the hosting companies and Internet backbone providers and newspapers and publishers and distributors were to refuse to publish it, why, we wouldn't have a democracy anymore at all.

        The best way to stop this

        • by jd (1658)

          I agree with you in that it isn't good for democracy and that such a platform should be provided. The mere fact that one company could have the power to effectively eliminate all such platforms is, however, proof that what we have is most certainly not democracy than that claims that a single entity can ever constitute democracy are highly suspect at best, propoganda at worst. Far from scoffing, I take it as a dangerous sign that the media (who are ethically obliged to provide accurate, honest information)

          • They only have get out of jail free cards because the people pick their representatives, and they pick corruption and incompetence. I don't see fault with the system, but with the people participating in the system.
  • by Anonymous Coward

    Amazon is a personification of the spirit of the Internet, which is one of true democracy

    Eh? And here I thought Amazon was a company trying to make money by selling goods and services.

    • by hedwards (940851)

      And what could be more democratic than selling goods for compensation? Isn't that generally how democracy works? You pay them your vote for them to give you whatever you want. And in modern times, you pay their campaign a lot of money and get to dump your toxic waste wherever you like.

  • Well done Amazon - you succeeded in failing

  • by RyanFenton (230700) on Saturday April 23, 2011 @03:08PM (#35916356)

    When there's a 'service' you'd like to block (such as adverts), amazon hosting can make it rather difficult to consistently block them using an IP blacklist, without also blocking potentially useful things too.

    Essentially though, they're just packaging the benefits of an economy of scale - things get cheaper the more you focus on larger supply, and thus they can make the most profits and cut off the most competition by scaling up so much with cheap prices. It's part of how companies from WalMart and Google compete so well.

    Economies of scale are also one part of why markets inherently fail over time - competition almost always favors those who scale up best, who can then leverage that power over competitors, preventing them from growing to the same extent, and breaking any meaning to the freedom of the market. At that point, competition becomes defined by who can serve WalMart's interest best.

    Ryan Fenton

  • by girlintraining (1395911) on Saturday April 23, 2011 @03:15PM (#35916420)
    Microsoft: We're sorry our product broke and a lot of people weren't able to get online. Slashdot: BURN THE HERETIC! Amazon: We're sorry our product broke and a lot of people weren't able to get online. Slashdot: It's okay. Here, have a cookie.
    • by cecom (698048)

      One of them is a monopoly in a couple of important areas, and using that monopoly to muscle itself via brute force in nearly every single aspect of computing (gaming, mobile, cloud, etc) - guess which one?
      Microsoft can no longer be judged solely on technical grounds (where fortunately they do suck).

    • Microsoft: We're sorry our product broke and a lot of people weren't able to get online. Slashdot: BURN THE HERETIC! Amazon: We're sorry our product broke and a lot of people weren't able to get online. Slashdot: It's okay. Here, have a cookie.

      Where have I heard this before:
      "Our customers depend on our product line, but refuse to pay even more money to upgrade from XP to MS OS de jour" -- Fine, delay the security updates, the more viruses and downtime XP users suffer, the more incentive they'll have to upgrade... Why not move Office to the Cloud?

      "Our customers depend on our media and information services, but refuse to pay even more money to access the premium entertainment media since our generic Internet service provides adequate enterta

  • by ShipIt (674797) on Saturday April 23, 2011 @03:42PM (#35916532)
    Totally concur with others pointing out Amazon offers redundancy if you choose to use it.

    We had webservers, database (master/slave,) and other services split across usa-east and usa-west.

    When usa-east started showing problems, we:
    *) Took the usa-east webservers out of round robin DNS (ttl 1hr)
    *) Verified the slave (in usa-west) was up to date, shut down the master (usa-east,) and converted the slave to master.
    *) Updated all webservers to point to the new master.
    *) Cranked up new usa-west webservers / updated round robin DNS

    I believe Amazon offers mechanisms to do this automatically or we could just always write our own failover scripts, but this is the tradeoff me made. We were willing to trade some service degradation by switching over manually in exchange for avoiding the pitfalls of false-positive detection. Very much an application specific tradeoff, not for everyone, but it worked for what we are doing.

    The key was to avoid putting all eggs in the usa-east basket and splitting up across usa-west, even though we incur additional bandwidth fees, ie master/slave replication transfer is full fee between regions.

    We were never concerned about cascading failures effecting multiple availability zones in a give region nor did it matter for us - our redundancy requirement was geographical diversity, not partitions within a datacenter. We were thinking natural disaster, but the architecture covered us in this case as well.

    The coolest thing to me is just how quickly we were able to shuffle around these resources to avoid a problem area - a couple of hours. There's no way we could have done it so quickly with what we had before - a combination of our own colocated servers and VPS.
    • by codepunk (167897)

      Nail hit head, you are correct the key to staying running is planning for failure. Anyone that experienced a multi hour outage obviously had not thought things through.

  • There is a whole world out there who didn't even notice Amazon EC2 outage (me included).

    Just sayin'.

  • lesson learnt (Score:2, Insightful)

    by Anonymous Coward

    I was directly affected by this outage. Once i discovered that the issue was at amazon and not at application- i restored from a previous snapshot, synced my application code, and associated my IP to a new instance in a functioning zone.

    Total downtime for me was probably just under an hour. And that's including my debugging time.

    Overall it wasn't the end of the world for me and i did discover I should make my redundancy setup run more frequently.

    Sure i lost a few sales, but in a way i look at this as an

  • by unity100 (970058) on Saturday April 23, 2011 @06:36PM (#35917436) Homepage Journal
    rackspace.com, softlayer.com, hetzner.de -> most of the web is housed on big providers like these. personal, organization, and small businesses are alike. these providers' main business is renting racks and servers, which are then used by hosts to rent to end customers.

    i dont know where does this 'how much of the net relies on amazon became clear' bullshit comes from. are there any statistics to show for it ? or, are people unaware of what's going on outside their little world window of expertise, so that they think that amazon cloud, for some reason, has become the 'backbone' of internet ?

    really. where are the statistics ? all i see, some random guy gives away some pdf by hosting it through amazon's cloud, and then proceeds to claim that 'net' became too reliant on, amazon ...

    really ....
  • Let's just see how they expand their cloud services and see if it wants to eat at Amazon's other ventures.
    • Let's just see how they expand their cloud services and see if it wants to eat at Amazon's other ventures.

      Fanboi much?
      Apple has made the decision, fairly quietly, that they are no longer going to sell Xserve server products past Jan 2011.

      FAQs for the Xserve End Of Life
      Q: Where can I see what Apple has announced about Xserve?

      A: The official announcement is here: http://www.apple.com/xserve/resources.html [apple.com]

      Guess what? That Apple URL -- gone.

      Q: What does this mean for the operating system software, Mac OS X Server? Will there be an upgrade for Mac OS X 10.7 Lion for Xserve?

      A: Apple has made no announcement about its plans for Mac OS X Server software.

      Q: What are the alternatives sources of hardware?

      A: At the time of this post, there are no other suppliers of rack mounted hardware than can run Mac OS X Server.

      Q: Can I run OS X Server in a virtual machine on other hardware?

      A: At the time of this post, no. The license for OS X Server prohibits installation on hardware from any manufacturer except Apple.

      Q: What are the alternatives for an organization dependent on Xserve?

      A: You must plan to migrate to another hardware platform, either Apple’s (Mac Pro or Mac Mini) or transition to servers running Windows or Linux.

      Maybe you're right, maybe Apple is so bloody cunning that they End of Life'd their server line to ensure that Apple is the only one who can use Apple software / hardware to provide Apple cloud services...

      In any event, I won't be buying into their mono-culture with silent death hanging over my head. Giving controll of both the hardware an

  • Slashdot didn't go down so I didn't know about it until some one posted a story. It's not the big things like this that worry me about being in the cloud. It is the small things you are never going to know about until it is to late that worry me.
  • These guys need to spin for TEPCO.
  • but I think what people are missing is how vulnerable using the cloud makes us, not how much we depend on Amazon.. When our own systems go down, they affect us. When just one supplier in the cloud goes down it affects many.and can have wide reaching consequences. There are many positive aspects to cloud computing, but we tend to ignore its shortcomings in our enthusiasm.

Those who can, do; those who can't, simulate.

Working...