Forgot your password?
typodupeerror
Social Networks IT

Inside Facebook's Infrastructure 77

Posted by CmdrTaco
from the when-it-crashes-it-burns dept.
miller60 writes "Facebook served up 690 billion page views to its 540 million users in August, according to data from Google's DoubleClick. How does it manage that massive amount of traffic? Data Center Knowledge has put together a guide to the infrastructure powering Facebook, with details on the size and location of its data centers, its use of open source software, and its dispute with Greenpeace over energy sourcing for its newest server farm. There are also links to technical presentations by Facebook staff, including a 2009 technical presentation on memcached by CEO Mark Zuckerberg."
This discussion has been archived. No new comments can be posted.

Inside Facebook's Infrastructure

Comments Filter:
  • Environmentalist (Score:3, Interesting)

    by AnonymousClown (1788472) on Thursday September 30, 2010 @08:51AM (#33745812)
    I support environmental causes (Sierra Club and others), but I for one will not support Greenpeace and I don't think they are credible. They use violence to get their message out and their founder is now a corporate consultant that shows them how to get around environmental laws and pollute.

    That's all.

    • Re: (Score:3, Funny)

      by bsDaemon (87307)

      Yeah, but if they're against Facebook, they can't be all bad. Sort of like the Mafia vs Castro, right?

      • by Rogerborg (306625)

        Funny, I was just thinking that if Greenpeace are against Facebook, that means Facebook can't be all bad.

        Then I remembered that we're not in Bushville Kiddy Matinee Land, and that it could easily be all-bad vs all-bad. Best result is that they attack facebook's data centre with clockwork chainsaws, and it falls on them and crushes them into Soylent Green.

      • by orient (535927)

        Sort of like the Mafia vs Castro, right?

        Right, Castro seems to be the good guy here.

    • by tehcyder (746570)

      They use violence to get their message out

      Examples please.

      Sorry, I mean [citation needed]

  • Facebook ID (Score:4, Funny)

    by Thanshin (1188877) on Thursday September 30, 2010 @08:52AM (#33745828)

    It's time to invent the Facebook Identity card.

    You can't remember your passport number? No worries, your Facebook Identity card will say who you are. And how many friends you've got. And the name of your pet. And whether you went to the bathroom at your usual time that morning. And what kind of men you find attractive.

    Semper Facebook Identity!

    • by jon42689 (1098973)
      I threw up in my mouth a little bit upon reading this. God save us.
    • by PFactor (135319)
      As a former Marine I'm afraid I'm going to have to "liberate" you for your perversion of "Semper Fi".
      • by tcopeland (32225)

        > As a former Marine I'm afraid I'm going to have to "liberate" you

        I think the nomenclature these days is that he's a target that needs to be "serviced".

    • by alphax45 (675119)
      Facebook only knows what YOU decide to tell them. They can't (yet?) read your mind. As long as you’re smart about what data you decide to give them it is a great tool to keep in touch with friends and in a lot of cases family. There are privacy settings (although not always easy to find/use) that allow you to control who can see your data. I just set mine to “Friends Only” (one button now) and I only friend people I know/trust. I don't see why people always say Facebook knows everything ab
      • Re:Facebook ID (Score:4, Interesting)

        by rtaylor (70602) on Thursday September 30, 2010 @09:29AM (#33746104) Homepage

        Facebooks knows anything about you that 3rd parties (friends, family, etc.) might tell them too.

        I didn't create an account or provide any information to facebook; yet there are bits and pieces of information on it about me.

        • Re: (Score:3, Interesting)

          by tophermeyer (1573841)

          One of the reasons that's keeping me from deleting my facebook account is that having it active allows me to untag myself from all the pictures that I wish my friends would stop making public. If I didn't have an account they could link to, my name would just sit on the picture for anyone to see.

      • by mlts (1038732) *

        I use four items for FB:

        First, i assume anything I put on FB will end up in my worst enemy's hands, with the paranoiac fantasy #include option turned on. This means not listing when out on vacation because someone who may be interested in a burglary might be reading, not listing where I work, not listing the exact model of vehicle I have, and so on.

        Second, what I do is set permissions so all my stuff is only visible by one group of friends. This group, I manually add people too. This way, should someone

        • by Cylix (55374) *

          I think the only safe rule with applications is to not use them due to their sloppy policies with information.

          At least that is my rule anyway.

    • Gee thanks, now I gotta pee.

  • Slashdotted (Score:3, Informative)

    by devjoe (88696) on Thursday September 30, 2010 @09:01AM (#33745882)
    Maybe Data Center Knowledge should put some of that knowledge to work, as the article is slashdotted after only 5 comments.
    • I tried to Coral Cache it, but their stupid redirect from the "human friendly" URL just resulted in "Resource not found" error.

      More fool them.
  • Looks like Data Center Knowledge could use some of that infrastructure.

  • I managed to load the first page of the article before it got slashdotted:

    With more than 500 million active users, Facebook is the busiest site on the Internet and has built an extensive infrastructure to support this rapid growth. The social networking site was launched in February 2004, initially out of Facebook founder Mark Zuckerberg’s dorm room at Harvard University and using a single server. The company’s web servers and storage units are now housed in data centers around the country.

    Each

    • Perhaps they got it up and running again, but here's page two in case it dies:

      This chart provides a dramatic visualization of Facebook’s infrastructure growth. It documents the number of servers used to power Facebook’s operations.

      “When Facebook first began with a small group of people using it and no photos or videos to display, the entire service could run on a single server,” said Jonathan Heiliger, Facebook’s vice president of technical operations.

      Not so anymore. Technical

      • Last page.

        Facebook says the Prineville data center will be designed to a Gold-level standard under the LEED (Leadership in Energy and Environmental Design) program, a voluntary rating system for energy efficient buildings overseen by the US Green Building Council. The Prineville facility is expected to have a Power Usage Effectiveness (PUE) rating of 1.15. The PUE metric (PDF) compares a facility’s total power usage to the amount of power used by the IT equipment, revealing how much is lost in distrib

  • Freaking SEOs... (Score:3, Insightful)

    by netsharc (195805) on Thursday September 30, 2010 @09:13AM (#33745966)

    Facebook is... Facebook has... fucking SEO monkeys must be at work making sure the company isn't referred to as "it", because that ruins the google-ability of the article, and they'd rather have SEO ratings than text that reads like it's been written by a fucking 3rd grader.

    SEO-experts... even worse than lawyers.

  • by francium de neobie (590783) on Thursday September 30, 2010 @09:13AM (#33745972)
    It links to Facebook's "wrong browser" page. The real link may be here: http://www.facebook.com/video/video.php?v=631826881803 [facebook.com]
    • It's Facebook. It knows which browser you used. It changed the link just for you based upon that information.

      Everyone else sees what you posted.

      Tinfoil hats mandatory from here on in.
  • by Average_Joe_Sixpack (534373) on Thursday September 30, 2010 @09:15AM (#33745998)

    USENET and /. (RIP Digg)

  • Cache (Score:3, Informative)

    by minus9 (106327) on Thursday September 30, 2010 @09:20AM (#33746030) Homepage
    • by mariushm (1022195)

      or just paste the url of each page in Google and you'll get the first result a link to the page with "cached" link in the bottom right.

  • Yawn.. move along (Score:3, Informative)

    by uncledrax (112438) on Thursday September 30, 2010 @09:23AM (#33746068) Homepage

    The article isn't worth reading IMO, not unless you're curious as to how much electricity some of the FB datacenters use. Otherwise it's light on the tech details.

    • Re: (Score:3, Informative)

      by drsmithy (35869)

      The article isn't worth reading IMO, not unless you're curious as to how much electricity some of the FB datacenters use. Otherwise it's light on the tech details.

      Indeed. "All you wanted to know about FaceBook's infrastructure" and little more than a passing mention about their storage ? That's vastly more interesting information than where their datacenters might physically be.

    • woah woah woah there buddy. Who said anything about reading articles!
    • by stiller (451878)

      If you want tech details, check out this excellent talk by Tom Cook of FB from Velocity last June: http://www.youtube.com/watch?v=T-Xr_PJdNmQ [youtube.com]

  • by mlts (1038732) * on Thursday September 30, 2010 @10:33AM (#33746868)

    Call me dense, but with all the racks of 1U x86 equipment FB uses, wouldn't they be far better served by machines built from the ground up to handle the TPM and I/O needs?

    Instead of trying to get so many x86 machines working, why not go with upper end Oracle or IBM hardware like a pSeries 795 or even zSeries hardware? FB's needs are exactly what mainframes are built to accomplish (random database access, high I/O levels) and do the task 24/7/365 with five 9s uptime.

    To boot, the latest EMC, Oracle and IBM product lines are good at energy saving. The EMC SANs will automatically move data and spin down drives not in use to save power. The CPUs on the top of the line equipment not just power down what parts are not in use, but wise use of LPARs or LDoms would also help with energy costs just due to having fewer machines.

    • by njko (586450) <naguil@@@yahoo...com> on Thursday September 30, 2010 @11:27AM (#33747676) Journal
      The purpose of server farms with comodity hardware is just to avoid vendor lock-in, if you have a good business but you are tied to a vendor the Vendor has a better business than you. they can charge you whatever they want.
      • by mlts (1038732) * on Thursday September 30, 2010 @11:42AM (#33747892)

        That is a good point, but to use a car analogy, isn't it like strapping a ton of motorcycles together with duct tape and having people on staff to keep them all maintained so the contrivance can pull a 18-wheeler load? Why not just buy an 18-wheeler which is designed and built from the ground up for this exact task?

        Yes, you have to use the 18-wheeler's shipping crates (to continue the analogy), but even with the vendor lock-in, it might be a lot better to do this as opposed to trying to cobble a suboptimal solution that does work, but takes a lot more man-hours, electricity, and hardware maintaining as opposed to something built from the factory for the task at hand.

        Plus, zSeries machines and pSeries boxes happily run Linux LPARs. That is as open as you can get. It isn't like it would be moving the backend to CICS.

        • by njko (586450)
          Well tactically you sholud allways use the right tool for the problem. but strategically you should be aligned with long term goals and the vision of the CTO/CIO and other nonTechnical factors. Vendor independence may be one of the neccesaries key items to achieve the vision.
        • by RajivSLK (398494) on Thursday September 30, 2010 @04:04PM (#33751920)

          Well we do the same thing as facebook but on a much smaller scale... Our "commodity hardware" (mostly supermicro motherboards with generic cases, memory etc) has pretty much the same uptime and performance as vendor servers. For example we have a Quad CPU database server that has been up for 3 years. If I remember correctly it cost about 1/2 as much as a server with equivalent specs from a vendor.

          The system basically works like this. Buy 5 or so (or 500 if you are facebook) servers at once with identical specs and hardware. If a server fails (not very often) there are basically 4 common reasons:

          1) Power supply or fan failure -- very easy to identify.
              Solution: Leave server down until maintenance day (or whenever you have a chance) swap for a new power supply (total time 15min [less time that calling the vendor tech support]).

          2) Hard drive failure -- usually easy to identify
              Solution: Leave server down until maintenance day (or whenever you have a chance) swap for a new hard drive (total time 15min [less time that calling the vendor tech support]). When the server reboots it will automatically be setup by various autoconfig methods (bootP whatever). I suspect that facebook doesn't even have HDs in most servers.

          3) Ram Failure -- can be hard to indentify
              Solution: Leave server down until maintenance day (or whenever you have a chance) swap for new ram (total time 15min [less time that calling the vendor tech support]).

          3) Motherboard Failure (almost never happens) -- can be hard to indentify
              Solution: Replace entire server -- keep old server for spare parts (ram, power supply whatever)

          I don't really see what a vendor adds besides inefficiency. If you have to call a telephone agent who then has to call a tech guy from the vendor who then has to drive across town at a moments notice to spend 10 minutes swapping out your ram it's going to cost you. At a place like facebook why not just hire your own guy?

    • Re: (Score:3, Interesting)

      by Cylix (55374) *

      The latest x86 architecture lines are moving far more in the direction of mainframe type units in terms of density and bandwidth. This is a hardware type from several years back and would not be really compare to the denser offerings being explored today. However, the reasoning behind commodity hardware is not just the ability to switch to one platform from another, but rather it keeps costs down with vendor competition. One design can be produced by multiple vendors with the goal of earning the lowest bid.

      • by mlts (1038732) *

        Very true. However, by moving the redundancy to the top of the application stack, doesn't that become inefficient after a while?

        For example, I have an application that sits on a mainframe on a big LPAR with the database server on another LPAR on the same machine. Because the redundancy is handled by everything below it, the application can be a lot simpler, with fewer bugs. It does not have to worry about being consistent among a number of boxes, just run and let the rest of the stack below it do

        On a mor

        • by Cylix (55374) *

          The application doesn't necessarily have to the be redundant aware. It really depends on several factors and most importantly how session data is handled. In most cases, the actual load is handled by specialized load balancers that manage the distribution. However, depending on exactly how much money you want to throw at the scenario there are various ways to scale up the application.

          I'm not sure how much crack you are smoking, but you just said parallelism has worse diminishing returns then a monolithic pl

    • by Danathar (267989)

      Do you have an IBM business card or phone number they can call you at for a Sales pitch?

      • Re: (Score:3, Interesting)

        by mlts (1038732) *

        Actually neither. Its just that to an observer like me, FB is trying to reinvent the wheel on a problem that already has been solved.

        Obviously, IBM is not cheap. Nor is Oracle/Sun hardware. However, the time and money spent developing a large scale framework on the application layer is not a trivial expense either. It might be that the time FB puts in trying to deploy something uncharted like this may cost them more in the long run.

    • EMC won't automatically move anything. Stop listening to marketing droids. "FASTv2" on anything you can connect to a mainframe is nothing but a marketing slide at this point, and likely will continue to be for the next 6 months.
    • Re: (Score:3, Interesting)

      by TheSunborn (68004)

      The problem is that for any specific budget* the x86-64 solution will give you more aggregate io and more processor hardware then the mainframe. The argument for the mainframe is then that the software might be more easy to write but there don't exists any mainframe which can serve even 1/10 of Facebook so you need to cluster them anyway. And if you need to special cluster magic you might as well have x86-64.

      And IBM will not promise you 99.999% uptime if you buy a single mainframe. If you need that kind of

  • by Comboman (895500) on Thursday September 30, 2010 @11:03AM (#33747358)
    "690 billion page views to its 540 million users in August"? Good lord, that's 1278 page views PER USER in just one month! That's (on average) 41 page views per user, per day, every single day! The mind boggles.
    • yea? so? it's not like an addiction or something! </fiending>
    • by Revvy (617529) *
      I'm wondering if they're counting every one of the automated page update checks as a page view. I'd be really curious to see exactly what they count as a page view.
    • by carlosap (1068042)
      41 pages per users or more, with more smartphones that number is going to grow, personally i visit 40 pages or more, thats a crazy number. I dont use traditional email any more, just check the comments of my friends and thats it.
    • Re: (Score:3, Interesting)

      by Overzeetop (214511)

      Have you seen how often Facebook crashes /has problems? you have to constantly reload the thing to get anything done. Thank goodness Google Calendar doesn't have that problem or I'd probably have a thousand hits a day to my calendar page alone.

      Also, FB pages tend to be pretty content-sparse. It's not uncommon for me to hit a dozen pages in 2-3 minutes if I check facebook.

    • by gangien (151940)

      go to fb, go to friends page, go back, go to another friends page, go back, go to farmville, go back

      7 page views, for doing very little, right? how many times a day?

      that's ignoring things like farmville gifts which require i think 3 page views for accepting/sending/returning each gift.

      and of course there's the teen girl that does lord knows how many things.

  • by peter303 (12292) on Thursday September 30, 2010 @11:44AM (#33747914)
    Its interesting how FB is open about their data server infrastructure while some places like Google and MicroSoft ware very secretive. It is competitive for Google to shave every tenth of second off of a search they can through clever software and hardware. They are an "on ramp" to the Information Super Highway, not a destination like FB. And because Google is one of the largest data servers on the planet, even small efficiency increases translate in mega-million-dollar savings.
  • by peter303 (12292) on Thursday September 30, 2010 @11:56AM (#33748090)
    When these data centers start showing up as measurable consumers of the national power grid and components of the GDP, you might consider them metamorphically as power-plants of the information industry.

    In his book on the modern energy industry "The Bottomless Well", author Peter Huber places commodity computing near the top of his "energy pyramid". Peter's thesis is modern technology has transformed energy into ever more sophisticated and useful forms. He calls this "energy refining". At the base of his pyramid are relative raw energy like biomass and coal. The come electrivity, computing, optical, etc. I think its interesting to view computing a refined form of energy.

When all else fails, read the instructions.

Working...