Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Cloud Networking The Internet

VMware Causes Second Outage While Recovering From First 215

jbrodkin writes "VMware's new Cloud Foundry service was online for just two weeks when it suffered its first outage, caused by a power failure. Things got really interesting the next day, when a VMware employee accidentally caused a second, more serious outage while a VMware team was writing up a plan of action to recover from future power loss incidents. An inadvertent press of a key on a keyboard led to 'a full outage of the network infrastructure [that] took out all load balancers, routers, and firewalls... and resulted in a complete external loss of connectivity to Cloud Foundry.' Clearly, human error is still a major factor in cloud networks."
This discussion has been archived. No new comments can be posted.

VMware Causes Second Outage While Recovering From First

Comments Filter:
  • Game Over (Score:4, Insightful)

    by ae1294 ( 1547521 ) on Monday May 02, 2011 @08:00PM (#36006096) Journal

    The cloud is a lie. Would the next marketing buzz world please come on down!

  • by shuz ( 706678 ) on Monday May 02, 2011 @08:18PM (#36006220) Homepage Journal
    VMware's explanation of events is troubling to me. The company as a whole is responsible for any of its failures. Internally the company could blame an individual but to shareholders and other vested entities an individual employee's failure is not something they care about. A better PR response would be to say that "we" made an unscheduled change or simply an unscheduled change was made to our infrastructure that caused X. This also outline a major issue with "cloud" technologies. They are only as redundant and stable as the individuals managing them. Also that there is always the opportunity for single point of failure in any system, you just need to go up the support tree high enough. For most companies this is the data center itself as offsite DR can get expensive quick. For VMware it can be the Virtual Center, a misconfigured vRouter or even vSwitch. Finally putting all your eggs into one basket can increase efficiency and save money. It can also raise your risk profile. An engineer may have caused this outage but I would find it hard to believe that replacing the engineer would make the "risk" go away.
  • by nurb432 ( 527695 ) on Monday May 02, 2011 @08:45PM (#36006404) Homepage Journal

    I am sure that is what happened. I don't know of any single keystroke that would take down an entire data center. ( aside from that big red button on the wall over there.. )

  • by ToasterMonkey ( 467067 ) on Monday May 02, 2011 @10:36PM (#36007002) Homepage

    VMware's explanation of events is troubling to me. The company as a whole is responsible for any of its failures. Internally the company could blame an individual but to shareholders and other vested entities an individual employee's failure is not something they care about. A better PR response would be to say that "we" made an unscheduled change or simply an unscheduled change was made to our infrastructure that caused X.

    "Transparency is bad" +4 Insightful

    What the... ?

Term, holidays, term, holidays, till we leave school, and then work, work, work till we die. -- C.S. Lewis

Working...