Amazon EC2 Failure Post-Mortem 117
CPE1704TKS tips news that Amazon has provided a post-mortem on why EC2 failed. Quoting:
"At 12:47 AM PDT on April 21st, a network change was performed as part of our normal AWS scaling activities in a single Availability Zone in the US East Region. The configuration change was to upgrade the capacity of the primary network. During the change, one of the standard steps is to shift traffic off of one of the redundant routers in the primary EBS network to allow the upgrade to happen. The traffic shift was executed incorrectly and rather than routing the traffic to the other router on the primary network, the traffic was routed onto the lower capacity redundant EBS network. For a portion of the EBS cluster in the affected Availability Zone, this meant that they did not have a functioning primary or secondary network because traffic was purposely shifted away from the primary network and the secondary network couldn't handle the traffic level it was receiving."
I realise this is "News for Nerds"... (Score:4, Funny)
But can I get an understandable car analogy here?
Re:I realise this is "News for Nerds"... (Score:5, Funny)
Re:I realise this is "News for Nerds"... (Score:4, Funny)
Instead of the usual commuter rail line, we've had to do some maintenance causing us to provide a single Yugo as transport for the NY morning rush.
After packing 25 angry commuters into the Yugo we left a few hundred thousand stranded on the platform, ping-ponging between the parking lot and home, completely confused how they would get to work.
In addition to that, unfortunately the Yugo couldn't handle the added weight of the passengers and the leaf springs shattered all over the ground. So the 25 passengers we initially planned for were left trapped, to die, inside of the disabled Yugo. They all starved in the days it took us to realize the Yugo never left the station parking lot.
We are sorry for any inconvenience this may have caused and have upgraded to AAA Gold status to prevent any further future disruptions. This will ensure that at least 25 people will actually reach their destinations should this occur again, though they may need to ride on a flat-bed to get there.