Inside Facebook's Infrastructure 77
miller60 writes "Facebook served up 690 billion page views to its 540 million users in August, according to data from Google's DoubleClick. How does it manage that massive amount of traffic? Data Center Knowledge has put together a guide to the infrastructure powering Facebook, with details on the size and location of its data centers, its use of open source software, and its dispute with Greenpeace over energy sourcing for its newest server farm. There are also links to technical presentations by Facebook staff, including a 2009 technical presentation on memcached by CEO Mark Zuckerberg."
Re: (Score:2)
Amusingly, I read that as "a technical presentation on mendacity by Mark Zuckerberg".
Environmentalist (Score:3, Interesting)
That's all.
Re: (Score:3, Funny)
Yeah, but if they're against Facebook, they can't be all bad. Sort of like the Mafia vs Castro, right?
Re: (Score:2)
Funny, I was just thinking that if Greenpeace are against Facebook, that means Facebook can't be all bad.
Then I remembered that we're not in Bushville Kiddy Matinee Land, and that it could easily be all-bad vs all-bad. Best result is that they attack facebook's data centre with clockwork chainsaws, and it falls on them and crushes them into Soylent Green.
Re: (Score:1)
Sort of like the Mafia vs Castro, right?
Right, Castro seems to be the good guy here.
Re: (Score:2)
Examples please.
Sorry, I mean [citation needed]
Facebook ID (Score:4, Funny)
It's time to invent the Facebook Identity card.
You can't remember your passport number? No worries, your Facebook Identity card will say who you are. And how many friends you've got. And the name of your pet. And whether you went to the bathroom at your usual time that morning. And what kind of men you find attractive.
Semper Facebook Identity!
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
> As a former Marine I'm afraid I'm going to have to "liberate" you
I think the nomenclature these days is that he's a target that needs to be "serviced".
Re: (Score:2)
When you put it that way, it sounds like something that would run afoul of DADT.
Re: (Score:2)
Re:Facebook ID (Score:4, Interesting)
Facebooks knows anything about you that 3rd parties (friends, family, etc.) might tell them too.
I didn't create an account or provide any information to facebook; yet there are bits and pieces of information on it about me.
Re: (Score:3, Interesting)
One of the reasons that's keeping me from deleting my facebook account is that having it active allows me to untag myself from all the pictures that I wish my friends would stop making public. If I didn't have an account they could link to, my name would just sit on the picture for anyone to see.
Re: (Score:2)
I use four items for FB:
First, i assume anything I put on FB will end up in my worst enemy's hands, with the paranoiac fantasy #include option turned on. This means not listing when out on vacation because someone who may be interested in a burglary might be reading, not listing where I work, not listing the exact model of vehicle I have, and so on.
Second, what I do is set permissions so all my stuff is only visible by one group of friends. This group, I manually add people too. This way, should someone
Re: (Score:2)
I think the only safe rule with applications is to not use them due to their sloppy policies with information.
At least that is my rule anyway.
Re: (Score:1)
Gee thanks, now I gotta pee.
Slashdotted (Score:3, Informative)
Re: (Score:2)
More fool them.
Slashdotted (Score:2)
Looks like Data Center Knowledge could use some of that infrastructure.
First page of Article (Score:1)
I managed to load the first page of the article before it got slashdotted:
With more than 500 million active users, Facebook is the busiest site on the Internet and has built an extensive infrastructure to support this rapid growth. The social networking site was launched in February 2004, initially out of Facebook founder Mark Zuckerberg’s dorm room at Harvard University and using a single server. The company’s web servers and storage units are now housed in data centers around the country.
Each
Re: (Score:1)
Perhaps they got it up and running again, but here's page two in case it dies:
This chart provides a dramatic visualization of Facebook’s infrastructure growth. It documents the number of servers used to power Facebook’s operations.
“When Facebook first began with a small group of people using it and no photos or videos to display, the entire service could run on a single server,” said Jonathan Heiliger, Facebook’s vice president of technical operations.
Not so anymore. Technical
Re: (Score:1)
Last page.
Facebook says the Prineville data center will be designed to a Gold-level standard under the LEED (Leadership in Energy and Environmental Design) program, a voluntary rating system for energy efficient buildings overseen by the US Green Building Council. The Prineville facility is expected to have a Power Usage Effectiveness (PUE) rating of 1.15. The PUE metric (PDF) compares a facility’s total power usage to the amount of power used by the IT equipment, revealing how much is lost in distrib
Freaking SEOs... (Score:3, Insightful)
Facebook is... Facebook has... fucking SEO monkeys must be at work making sure the company isn't referred to as "it", because that ruins the google-ability of the article, and they'd rather have SEO ratings than text that reads like it's been written by a fucking 3rd grader.
SEO-experts... even worse than lawyers.
Re: (Score:1)
"that's not been written by"...
Whargbl!
Re: (Score:2)
Mark Zuckerberg's presentation link is wrong (Score:3, Interesting)
Re: (Score:2)
Everyone else sees what you posted.
Tinfoil hats mandatory from here on in.
I'm sticking with antisocial networking (Score:3, Funny)
USENET and /. (RIP Digg)
Cache (Score:3, Informative)
Re: (Score:2)
or just paste the url of each page in Google and you'll get the first result a link to the page with "cached" link in the bottom right.
Yawn.. move along (Score:3, Informative)
The article isn't worth reading IMO, not unless you're curious as to how much electricity some of the FB datacenters use. Otherwise it's light on the tech details.
Re: (Score:3, Informative)
The article isn't worth reading IMO, not unless you're curious as to how much electricity some of the FB datacenters use. Otherwise it's light on the tech details.
Indeed. "All you wanted to know about FaceBook's infrastructure" and little more than a passing mention about their storage ? That's vastly more interesting information than where their datacenters might physically be.
Re: (Score:2)
Re: (Score:2)
If you want tech details, check out this excellent talk by Tom Cook of FB from Velocity last June: http://www.youtube.com/watch?v=T-Xr_PJdNmQ [youtube.com]
Re: (Score:1)
It's Prineville (without a "c").
Re: (Score:2)
Helps if you spell it correctly - it's Prineville without a "c" and with an "n". From Bing maps, it's about 150 miles south-east of Portland.
Call me dense, but... (Score:5, Interesting)
Call me dense, but with all the racks of 1U x86 equipment FB uses, wouldn't they be far better served by machines built from the ground up to handle the TPM and I/O needs?
Instead of trying to get so many x86 machines working, why not go with upper end Oracle or IBM hardware like a pSeries 795 or even zSeries hardware? FB's needs are exactly what mainframes are built to accomplish (random database access, high I/O levels) and do the task 24/7/365 with five 9s uptime.
To boot, the latest EMC, Oracle and IBM product lines are good at energy saving. The EMC SANs will automatically move data and spin down drives not in use to save power. The CPUs on the top of the line equipment not just power down what parts are not in use, but wise use of LPARs or LDoms would also help with energy costs just due to having fewer machines.
Re:Call me dense, but... (Score:5, Insightful)
Re:Call me dense, but... (Score:4, Insightful)
That is a good point, but to use a car analogy, isn't it like strapping a ton of motorcycles together with duct tape and having people on staff to keep them all maintained so the contrivance can pull a 18-wheeler load? Why not just buy an 18-wheeler which is designed and built from the ground up for this exact task?
Yes, you have to use the 18-wheeler's shipping crates (to continue the analogy), but even with the vendor lock-in, it might be a lot better to do this as opposed to trying to cobble a suboptimal solution that does work, but takes a lot more man-hours, electricity, and hardware maintaining as opposed to something built from the factory for the task at hand.
Plus, zSeries machines and pSeries boxes happily run Linux LPARs. That is as open as you can get. It isn't like it would be moving the backend to CICS.
Re: (Score:1)
Re:Call me dense, but... (Score:4, Interesting)
Well we do the same thing as facebook but on a much smaller scale... Our "commodity hardware" (mostly supermicro motherboards with generic cases, memory etc) has pretty much the same uptime and performance as vendor servers. For example we have a Quad CPU database server that has been up for 3 years. If I remember correctly it cost about 1/2 as much as a server with equivalent specs from a vendor.
The system basically works like this. Buy 5 or so (or 500 if you are facebook) servers at once with identical specs and hardware. If a server fails (not very often) there are basically 4 common reasons:
1) Power supply or fan failure -- very easy to identify.
Solution: Leave server down until maintenance day (or whenever you have a chance) swap for a new power supply (total time 15min [less time that calling the vendor tech support]).
2) Hard drive failure -- usually easy to identify
Solution: Leave server down until maintenance day (or whenever you have a chance) swap for a new hard drive (total time 15min [less time that calling the vendor tech support]). When the server reboots it will automatically be setup by various autoconfig methods (bootP whatever). I suspect that facebook doesn't even have HDs in most servers.
3) Ram Failure -- can be hard to indentify
Solution: Leave server down until maintenance day (or whenever you have a chance) swap for new ram (total time 15min [less time that calling the vendor tech support]).
3) Motherboard Failure (almost never happens) -- can be hard to indentify
Solution: Replace entire server -- keep old server for spare parts (ram, power supply whatever)
I don't really see what a vendor adds besides inefficiency. If you have to call a telephone agent who then has to call a tech guy from the vendor who then has to drive across town at a moments notice to spend 10 minutes swapping out your ram it's going to cost you. At a place like facebook why not just hire your own guy?
Re: (Score:3, Interesting)
The latest x86 architecture lines are moving far more in the direction of mainframe type units in terms of density and bandwidth. This is a hardware type from several years back and would not be really compare to the denser offerings being explored today. However, the reasoning behind commodity hardware is not just the ability to switch to one platform from another, but rather it keeps costs down with vendor competition. One design can be produced by multiple vendors with the goal of earning the lowest bid.
Re: (Score:2)
Very true. However, by moving the redundancy to the top of the application stack, doesn't that become inefficient after a while?
For example, I have an application that sits on a mainframe on a big LPAR with the database server on another LPAR on the same machine. Because the redundancy is handled by everything below it, the application can be a lot simpler, with fewer bugs. It does not have to worry about being consistent among a number of boxes, just run and let the rest of the stack below it do
On a mor
Re: (Score:2)
The application doesn't necessarily have to the be redundant aware. It really depends on several factors and most importantly how session data is handled. In most cases, the actual load is handled by specialized load balancers that manage the distribution. However, depending on exactly how much money you want to throw at the scenario there are various ways to scale up the application.
I'm not sure how much crack you are smoking, but you just said parallelism has worse diminishing returns then a monolithic pl
Re: (Score:2)
Do you have an IBM business card or phone number they can call you at for a Sales pitch?
Re: (Score:3, Interesting)
Actually neither. Its just that to an observer like me, FB is trying to reinvent the wheel on a problem that already has been solved.
Obviously, IBM is not cheap. Nor is Oracle/Sun hardware. However, the time and money spent developing a large scale framework on the application layer is not a trivial expense either. It might be that the time FB puts in trying to deploy something uncharted like this may cost them more in the long run.
Re: (Score:2)
Re: (Score:3, Interesting)
The problem is that for any specific budget* the x86-64 solution will give you more aggregate io and more processor hardware then the mainframe. The argument for the mainframe is then that the software might be more easy to write but there don't exists any mainframe which can serve even 1/10 of Facebook so you need to cluster them anyway. And if you need to special cluster magic you might as well have x86-64.
And IBM will not promise you 99.999% uptime if you buy a single mainframe. If you need that kind of
How many times a day do people check Facebook? (Score:3, Interesting)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Re: (Score:3, Interesting)
Have you seen how often Facebook crashes /has problems? you have to constantly reload the thing to get anything done. Thank goodness Google Calendar doesn't have that problem or I'd probably have a thousand hits a day to my calendar page alone.
Also, FB pages tend to be pretty content-sparse. It's not uncommon for me to hit a dozen pages in 2-3 minutes if I check facebook.
Re: (Score:2)
go to fb, go to friends page, go back, go to another friends page, go back, go to farmville, go back
7 page views, for doing very little, right? how many times a day?
that's ignoring things like farmville gifts which require i think 3 page views for accepting/sending/returning each gift.
and of course there's the teen girl that does lord knows how many things.
infrastructure secrecy versus openness (Score:3, Interesting)
data servers = industrial engines of 21st century (Score:3, Interesting)
In his book on the modern energy industry "The Bottomless Well", author Peter Huber places commodity computing near the top of his "energy pyramid". Peter's thesis is modern technology has transformed energy into ever more sophisticated and useful forms. He calls this "energy refining". At the base of his pyramid are relative raw energy like biomass and coal. The come electrivity, computing, optical, etc. I think its interesting to view computing a refined form of energy.