How Google Routes Around Outages 105
1sockchuck writes "Making changes to Google's search infrastructure is akin to 'changing the tires on a car while you're going at 60 down the freeway,' according to Urs Holzle, who oversees the company's massive data center operations. In a Q-and-A with Data Center Knowledge, Holzle discusses Google's infrastructure, how it has engineered its system to route around hardware failures, and how it responds when something goes awry. These updates usually go unnoticed, but during system maintenance last month a software bug triggered an outage for Gmail."
Google File System Paper (Score:5, Informative)
To those looking for a more in-depth description, check out the technical paper on the google file system:
http://labs.google.com/papers/gfs.html
Had to read it for a search engines course in college, it's pretty darn spiffy.
Simple, really... (Score:5, Informative)
The key point:
When they get an outage, they check how it was caught and if it wasn't caught automatically, they figure out how to next time. Simple rule: They learn from their mistakes and don't put all their eggs in one basket.
Re:Just me? (Score:4, Informative)
Replacing a wheel on a car going 60mph (Score:2, Informative)
Watch from 1:55 to 2:35:
Youtube video of guys replacing a wheel on a car while it is moving.. [youtube.com]