Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Cloud Networking The Internet

Amazon EC2 Failure Post-Mortem 117

CPE1704TKS tips news that Amazon has provided a post-mortem on why EC2 failed. Quoting: "At 12:47 AM PDT on April 21st, a network change was performed as part of our normal AWS scaling activities in a single Availability Zone in the US East Region. The configuration change was to upgrade the capacity of the primary network. During the change, one of the standard steps is to shift traffic off of one of the redundant routers in the primary EBS network to allow the upgrade to happen. The traffic shift was executed incorrectly and rather than routing the traffic to the other router on the primary network, the traffic was routed onto the lower capacity redundant EBS network. For a portion of the EBS cluster in the affected Availability Zone, this meant that they did not have a functioning primary or secondary network because traffic was purposely shifted away from the primary network and the secondary network couldn't handle the traffic level it was receiving."
This discussion has been archived. No new comments can be posted.

Amazon EC2 Failure Post-Mortem

Comments Filter:
  • by kriston ( 7886 ) on Friday April 29, 2011 @09:22AM (#35973806) Homepage Journal

    Dear AWS Customer,

    Starting at 12:47AM PDT on April 21st, there was a service disruption (for a period of a few hours up to a few days) for Amazon EC2 and Amazon RDS that primarily involved a subset of the Amazon Elastic Block Store (âoeEBSâ) volumes in a single Availability Zone within our US East Region. You can read our detailed summary of the event here:
    http://aws.amazon.com/message/65648 [amazon.com]

    Weâ(TM)ve identified that you had an attached EBS volume or a running RDS database instance in the affected Availability Zone at the time of the disruption. Regardless of whether your resources and application were impacted, we are going to provide a 10 day credit (for the
    period 4/18-4/27) equal to 100% of your usage of EBS Volumes, EC2 Instances and RDS database instances that were running in the affected Availability Zone. This credit will be automatically applied to your April bill, and you donâ(TM)t need to do anything to receive it.
    You can see your service credit by logging into your AWS Account Activity page after you receive your upcoming billing statement.

    Last, but certainly not least, we want to apologize. We know how critical the services we provide are to our customersâ(TM) businesses and we will do everything we can to learn from this event and use it to drive improvement across our services.

    Sincerely,
    The Amazon Web Services Team

    This message was produced and distributed by Amazon Web Services, LLC, 410 Terry Avenue
    North, Seattle, Washington 98109-5210

  • by gad_zuki! ( 70830 ) on Friday April 29, 2011 @09:29AM (#35973858)

    What is an EBS? Is it really just a Xen or VMWare disk image? Which data center corresponds with each availability zone? What are they using for storage iSCSI targets on a SAN?

"If I do not want others to quote me, I do not speak." -- Phil Wayne

Working...