Forgot your password?
typodupeerror
Bug Social Networks Databases

Facebook Unveils Details of Downtime 103

Posted by timothy
from the you-like-this dept.
An anonymous reader writes "Facebook officially gave out more technical details on the endless loop in a database control mechanism that forced a 2.5-hour shutdown of the social site, and the resulting combination of a productivity burst, increased fertility (check back on June 25, 2011) and mass hysteria all around the world."
This discussion has been archived. No new comments can be posted.

Facebook Unveils Details of Downtime

Comments Filter:
  • not very technical (Score:5, Interesting)

    by datapharmer (1099455) on Saturday September 25, 2010 @11:42AM (#33696984) Homepage
    The technical details are that I have an incompatible browser? Really Slashdot? Did you even check the links? of course not...
    • by Anonymous Coward on Saturday September 25, 2010 @11:45AM (#33696994)

      Correct link to technical details:

      http://www.facebook.com/note.php?note_id=431441338919&id=9445547199&ref=mf [facebook.com]

      (anon because I'm not a karma whore)

      • by OzPeter (195038)

        Correct link to technical details:

        Sounds like someone didn't do any testing or that their testing is in adequate.

      • by ProdigyPuNk (614140) on Saturday September 25, 2010 @11:53AM (#33697044) Journal
        You've got to read some of the comments posted in that thread, it's hilarious.

        Anoesj Sadraee It's great to hear and see that big companies like Facebook are so open with what they do. That's rare, very rare. Thanks!

        Anne Uriarte ~facebook is stiLL sooo sLow for uz irr! >;'((

        Phil McBride this site is becoming less secure lately... hackers are becoming more and more intelligent, i would know, cuz im a white hat lol

        • by EdIII (1114411)

          If you think that is funny, trying reading the article summary this way:

          Slashdot officially gave out more technical details on the endless loop in a database control mechanism that forced a 2.5-hour shutdown of the geek news site, and the resulting combination of a productivity burst, increased fertility (check back on June 25, 2011) and mass hysteria all around the world."

        • Hilarious? I feel sick. Names not changed to implicate the stupid:

          Paul Diaz: Will What i Say is get a front page they say facebook down due to server's and we are working hard to fix it get free cash ?:)

          Mouhssine Freedom Elmezyani It's very easy to rape facebook !! i know some friends can hack your compt through your electronic adress !& they hacked my compt several time in the pretext of kidding !!

          Mauro Guberti I'd like to know what's the necessary qualifications to work like moderator.

          And the one that prompted me to close the tab:

          Joanne Bozik The following link is the problem.........these people have been sending my name and pic to many stating that I purchased this product and I also in return am receiving the following link in my friends names......Please get after these people

          http://www.facebook.com/facebook?v=wall#!/note.php?note_id=431441338919&id=9445547199&ref=mf [facebook.com]

          (note, the link is the link to the explanation for the outage)

        • by Xest (935314)

          "I would know, cuz im a white hat lol"

          I'm going to see how many times I can use this line in conversation at work today. It's just brilliant.

      • by Twinbee (767046)

        Well that's daft because I can't see much difference in real value whether you have 5000 karma points or 5,000,000 'points' unless you know of a way to convert that to cash.

        • by elewton (1743958) on Saturday September 25, 2010 @01:36PM (#33697622)
          You could spend it on Karma whores or play Karma poker.
        • I can't see much difference in real value whether you have 5000 karma points or 5,000,000 'points'

          The difference is if you have 5,000 karma points...well, you have 5,000 virtual points on a site only geeks care about or even know exist.

          If you have 5,000,000 points, on the other hand, THAT'S OVER NINE THOUSAAAND!!!

        • by ultranova (717540)

          Well that's daft because I can't see much difference in real value whether you have 5000 karma points or 5,000,000 'points' unless you know of a way to convert that to cash.

          Points let you level up, upgrading your abilities and attributes, while karma points let you select a more powerful base class on your next playthrough.

  • OH NOES (Score:5, Insightful)

    by Pojut (1027544) on Saturday September 25, 2010 @11:47AM (#33697004) Homepage

    Meh, it happens...just like a power company, no one says a word when the thing works fine for weeks or months at a time...but when it goes down for a couple hours, people act like it never works.

    • Re: (Score:3, Insightful)

      by Mitchell314 (1576581)
      But it's a helluva a lot more important for a power company to stay up than FB, no power can cause serious problems. But FB down for two hours, man, the gods forbid you actually are productive or something . . .
      • But it's a helluva a lot more important for a power company to stay up than FB, no power can cause serious problems. But FB down for two hours, man, the gods forbid you actually are productive or something . . .

        This. FB needs perspective [bit.ly].

        • FB needs perspective [New York Post].

          That'd be a bit more convincing if it were more than just an ad for a $3.95 article.

      • I actually think Facebook being down would make them LESS productive...because instead of working, they're constantly checking to see if Facebook is back up. Those farms won't...whatever they do with those things...themselves.
  • by ProdigyPuNk (614140) on Saturday September 25, 2010 @11:47AM (#33697006) Journal
    You are using an incompatible web browser. Sorry, we're not cool enough to support your browser. Please keep it real with one of the following browsers:

    Obviously, the error was caused by too many people not keeping it real.

    • Re: (Score:3, Insightful)

      by jdong (1378773)
      I've got a great idea! Why don't we have every slashdot reader go in and try to fix the broken link? Then the problem will correct itself in no time!
  • by ryanleary (805532) on Saturday September 25, 2010 @11:49AM (#33697014)

    Since the link in the summary is broken, this is the facebook blog post [facebook.com].

    Post contents:
    Early today Facebook was down or unreachable for many of you for approximately 2.5 hours. This is the worst outage we’ve had in over four years, and we wanted to first of all apologize for it. We also wanted to provide much more technical detail on what happened and share one big lesson learned.

    The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.

    The intent of the automated system is to check for configuration values that are invalid in the cache and replace them with updated values from the persistent store. This works well for a transient problem with the cache, but it doesn’t work when the persistent store is invalid.

    Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second.

    To make matters worse, every time a client got an error attempting to query one of the databases it interpreted it as an invalid value, and deleted the corresponding cache key. This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn’t allow the databases to recover.

    The way to stop the feedback cycle was quite painful - we had to stop all traffic to this database cluster, which meant turning off the site. Once the databases had recovered and the root cause had been fixed, we slowly allowed more people back onto the site.

    This got the site back up and running today, and for now we’ve turned off the system that attempts to correct configuration values. We’re exploring new designs for this configuration system following design patterns of other systems at Facebook that deal more gracefully with feedback loops and transient spikes.

    We apologize again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously.

    • by Charliemopps (1157495) on Saturday September 25, 2010 @02:04PM (#33697746)
      Wow, I'm impressed by the detail they provided. More companies should handle outages like this. Makes them look like they know what they're doing, they figured it out, and it wont happen again. Instead of the typical stance of pretending it never happened.
      • by inKubus (199753)

        This shows the beginning of the end for Facebook. Reading the summary they provided provides many details such as the fact they don't have a QA environment or regional segments or anything. It's pretty dangerous to run a site that big like that. And I've read much more they've released that basically says they just hacked mysql replication to update their caches to get it real time across regions. What they should have done to horizontally scale is to implement regional shards and then some type of inter

    • by DiEx-15 (959602)

      We apologize again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously.

      Just like their commitment to security, I must ask this:

      SINCE WHEN?

  • by rabidjoe (1854904) on Saturday September 25, 2010 @11:49AM (#33697018)
    In the downtime I tried to show the woman the virtues of an IRC channel she could share with family and friends and how much less bullshit she would find there..... "does it have farmville?". There is no hope for the sheeples, the planet is doomed
    • by Allnighte (1794642) on Saturday September 25, 2010 @11:53AM (#33697042)
      I was going to disagree with you until I read one of the Facebook comments on the blog post talking about the error:

      "John Marshall: how do i get job workin with facebook i live in newcastle in uk can any one from facebook staff get or can some one give me a email address that i can use to contact facebook please"

      :|
      very doomed.
    • If the woman has a GameCube or Wii, and you want to get her off FarmVille, try buying her a copy of Harvest Moon: Magical Melody.
      • by neumayr (819083)
        While Harvest Moon has always been a very girl-compatible game even before it had all those social features modern iterations of the game has, it cannot get her away from FarmVille unless her and everyone she's playing FarmVille with has as much access to a Wii as to a webbrowser.
        Not going to happen.
        • by hairyfeet (841228)

          Have you tried giving her The Sims? Don't ask me why but for some reason the sims and the old AoE of all things, really seem to be like catnip to the fairer sex. Hell at the last shop I worked with doug made me install AoE and The sims on every desktop we had in stock. I was like "WTF? Hell if you are gonna put old game why not Quake II?" and he told me to watch and learn. Sure enough with having the sims and AoE running in the window it wasn't 30 minutes before the females started showing up wanting to loo

    • by rubycodez (864176)

      so you, being of independent mind, married a sheeple? you're a sheeple-fucker! and your children will be sheeple!

      conclusion: the sheeple aren't doomed....

    • by ibmjones (52133)

      In the downtime I tried to show the woman the virtues of an IRC channel she could share with family and friends and how much less bullshit she would find there

      Right. . . . [bash.org]

  • "mass hysteria"? (Score:5, Insightful)

    by SuperBanana (662181) on Saturday September 25, 2010 @11:52AM (#33697036)

    and mass hysteria all around the world.

    [citation needed].

    First I knew was when I read about it on another tech blog, hours after it'd happened...and I use Facebook. And I work with a ton of people who use it (grad students.)

    There wasn't mass hysteria; there was mass ambivalence. I'm now reading all these blog/news postings about how "everyone" went crazy. Nobody was talking about it where I ate dinner. Nobody was talking about it where I had coffee that evening. It didn't make my city newspaper- no "Facebook down, residents in despair" stories to be found.

    All this coverage claiming that everyone went nuts seems like a desperate attempt by Facebook PR to make something positive out of this...namely, trying to convince us that Facebook is so integral to the people who use it, it must, of course, be to us as well.

    • by Anonymous Coward on Saturday September 25, 2010 @12:25PM (#33697212)

      To me, Facebook is about as integral to my life as the toilet is. It's there, it's gonna be used every once in a while and it involves a bit of dirty business that you just can't avoid.

      • by microbee (682094) on Saturday September 25, 2010 @01:07PM (#33697450)

        It's a toilet all right, but a very annoying one: you hear every flush coming from your friends' toilet as well.

        • by IANAAC (692242)

          It's a toilet all right, but a very annoying one: you hear every flush coming from your friends' toilet as well.

          Really?

          The first time I see notice of anyone's flushes, I block the flush notices.

          Seems to me, if you're complaining about it, you don't really know what you're doing (and that's considering all you have to do is hover your mouse over their post).

      • by Lumpy (12016)

        You've never used a Bidet have you.

        It makes the "dirty business" downright civilized.

        That said, you do NOT have to use the toilet paper that is facebook. really.

      • And if it becomes unavailable for 2 hours, I do the dirty business on the street.
    • by ForexCoder (1208982) on Saturday September 25, 2010 @12:34PM (#33697298)

      [citation needed].

      [citation] [wikipedia.org]

    • Re: (Score:3, Interesting)

      by kurokame (1764228)
      To be fair, most grad students use Facebook primarily to help remind them that they need to come up for air occasionally. If it goes down and temporarily stops vying for their attention, they're likely to continue being absorbed with analyzing the data from their last attempt to apply an epicycle-based model to the sociology of small town karaoke sessions given a behavioral-political tensor formulation of motivation in a multidimensional vector space representing cheese.
      • by bsane (148894)

        They 'need to come up for air' every 2.5 hours? I don't know where to start.

        • by Klinky (636952)

          Haven't you seen Waterworld?

          • by bsane (148894)

            OK- I guess I should have started...

            I get the idiom- I think its sad that taking a break after 2.5 is worthy of the phrase... Is 2.5 hours of concentrating outside the realm of normal?

    • All this coverage claiming that everyone went nuts seems like a desperate attempt by Facebook PR to make something positive out of this...namely, trying to convince us that Facebook is so integral to the people who use it, it must, of course, be to us as well.

      Actually, all the coverage I've seen has been slanted like the summary above - that is, an attempt to denigrate and marginalize those that use Facebook. See the comments in Slashdot's previous coverage for some pretty clear examples of this.

    • by kesuki (321456)

      you are so on the right track there. the desire to spread news, especially bad news, is screwing with 'the system' there were places built to be safely ignorant... and some to just be quiet and relaxing... oh well.

    • Re: (Score:2, Insightful)

      by neumayr (819083)
      Uh. I took the summary as sarcasm of sorts. Which made your reaction seem like quite the overreaction. Then you got +5 Insightful...
  • by ProdigyPuNk (614140) on Saturday September 25, 2010 @12:23PM (#33697198) Journal
    Let's look at the important thing here with this outage: How many cows, pigs, chickens, cats, goldfish, etc were made to suffer? I know my girlfriend couldn't take care of her virtual cats, and their litterbox ended up full. They were not at all happy. I'm sure the same thing played out across thousands of FarmVille, MyPets, etc accounts. Please, won't someone think of the animal?
  • by Anonymous Coward

    If you suck so bad on a global scale long enough, eventually the universe tries to step in.

  • by Animats (122034) on Saturday September 25, 2010 @12:25PM (#33697214) Homepage

    Does the clock stop for Farmville if Facebook goes down?

  • Twitter... (Score:4, Interesting)

    by PmanAce (1679902) on Saturday September 25, 2010 @12:34PM (#33697288) Homepage
    I wonder if Twitter had a noticable increase in usage during the Facebook outage, or other social portals?
  • by Valpis (6866) on Saturday September 25, 2010 @12:39PM (#33697316)

    instead of people checking facebook every 5 minutes for the latest, very important, updates as they always do they now constantly was hitting reload for 2.5 hour

  • by trasgu (603018) *

    What is this Facebook thing? Isn't that something kids do on computers?

  • by michaelmalak (91262) <michael@michaelmalak.com> on Saturday September 25, 2010 @01:21PM (#33697528) Homepage
    Unless the particular arrangement of pixels on a Facebook webpage caused a powerful alignment of EM radiation, fertility was not affected. Perhaps fecundity was, but not fertility.
  • by WankersRevenge (452399) on Saturday September 25, 2010 @01:30PM (#33697594)
    My favorite server downtime story occurred back in early 2000 when I was working for Disney's Internet Group. All the message boards for the film and television websites ended up crashing. No one knew the cause and as the web-ops team investigated, we learned that the messageboard server wasn't even housed in any of Disney' server farms. After a lot of hair pulling, we found the server was located in a satellite office in Sunnyvale. Evidently, the server was just on an engineers desk. When that engineer left the company he neglected to tell anyone about the box so when the new engineer took his spot, she found she didn't like the noise from the machine. So one day, she pulled the plug, and put it in some out of the way spot in the office. There wasn't a lot of traffic on it, but it still makes me laugh to think of all the Tim Allen fans in distress over a misplaced box.
  • best summary on /. ever!
  • by NicknamesAreStupid (1040118) on Saturday September 25, 2010 @01:59PM (#33697726)
    As Facebook went off-line, I witnessed the unthinkable at an Internet cafe. Young men and women, innocently engaging in social networking intercourse, were suddenly thrown out of their Facebook world and into the reality of the real world, as though all had taken the red pill. Images distorted into 3D with a startling range of colors, sounds beyond stereo, and smells -- odors for the new fifth sense. Everyone looked around to witness "super high def" of each other, and some actually stood to experience a new perspective. Then, as if in concert, the unplugged Facebookers began to touch each other. Immediately untapped hormones raged as ancient primal urges emerged for the first time. Just as it was about to become an orgy of primal lust, the Cafe manager flipped on a You Tube video of Elmo, http://www.youtube.com/watch?v=UZHSDjtD-dg [youtube.com], and a disaster was avoided.
  • by thetoadwarrior (1268702) on Saturday September 25, 2010 @02:03PM (#33697738) Homepage
    Who cares if it's down even for a day. Just talk about your pointless activities twice as long the next day.
    • by glwtta (532858)
      Who cares if it's down even for a day. Just talk about your pointless activities twice as long the next day.

      Well, we care because as one of the largest sites, they are expected to have their shit together. So when they don't, it's interesting to see what happened.

      It's professional interest, it's not that I'm worried that people couldn't plant their snow peas for two hours, or whatever.
      • Everyone including NASA makes mistakes. Facebook is not going to be perfect and quite frankly I can live without finding out for 2.5 hours that my friend bought a new shirt. I'd be more concerned about someone like NASA reaching perfection long before I would ever care about Facebook reaching perfection.
        • by glwtta (532858)
          Facebook is not going to be perfect ...

          No one said it is, actually I don't think I said anything about perfection. Facebook is one of the largest apps in the world, you don't think the problems they face can be informative to some of Slashdot's readership?
          • Yes I do think it can be informative and I wouldn't have wanted the topic to be removed from Slashdot. My original point is that it's not that big of a deal despite the amount of attention this has received across the internet. Being interesting doesn't make it important.
  • It's just Facebook... If people really massively get hysterical over the unavailability of Facebook, that should count as yet another thing horribly, horribly wrong with the world...

The test of intelligent tinkering is to save all the parts. -- Aldo Leopold

Working...