Google Doubles Server Farm 258
Mitch Wagner writes "Here's our followup story on Google's colossal server farm. When we first wrote about Google last spring, they had 4,000 Linux servers, now they run 8,000. Last year we focused on the Linux angle, this year we thought it was more interesting to go into the hardware, giving a little detail about some of the things Google has to do to build and run a server farm that big." Impressive. I always think our 8 boxes are cool, until I see this kinda thing.
Audio Interview with Google Chief Ops Dude (Score:2)
Re:ROI on Linux (Score:2)
What the fuck is a "multithreaded TCP/IP stack"? The IP stack runs in both process context and interrupt context, there are no threads there, and it'd be stupid to use them. Perhaps you mean "fine grained locking," but just don't know what you're talking about.
Re:Electric bill (Score:2)
--
Re:im not really clear on.. (Score:2)
For the curious: PageRank does not depend on your query; it is a global property of the link structure of the web. So Google does a normal keyword search and combines a keyword similarity value with the PageRank value, and sorts on this magic value.
heck of a space heater (Score:2)
Re:One hell of a colocation (Score:2)
9 284.ATM7-0.XR2.DCA1.ALTER.NET (152.63.33.41) 5.685 ms 13.112 ms 4.145 ms
10 194.ATM7-0.GW3.DCA1.ALTER.NET (146.188.161.77) 5.545 ms 7.685 ms 4.475 ms
11 abovenet-dca1.ALTER.NET (157.130.37.254) 5.327 ms 6.011 ms 5.987 ms
12 core5-core1-oc48.iad1.above.net (208.185.0.146) 6.132 ms 5.715 ms 6.948 ms
13 core2-iad1-oc48.iad4.above.net (208.185.0.134) 5.818 ms 5.785 ms 6.011 ms
14 main1colo1-core2-oc12.iad4.above.net (208.185.0.66) 7.527 ms 5.400 ms 4.853 ms
15 64.124.113.173.available.google.com (64.124.113.173) 6.160 ms 5.705 ms 8.736 ms
It appears to be co-lo'd at above.net. This was ran on the www server.
Secret windows code
How many servers are indexing? (Score:2)
--
Re:A Real Reason They Can Get Away With That (Score:2)
Well, I'd think that eBay would split things up, as should Amazon, if they don't already.
Sure, if the Computer section of eBay goes south the computer bidders are pissed, but it doesn't affect the Beanie Baby contingent.
I think that the real reason that eBay/Amazon/Things 'N' Stuff aren't doing massive clustering (if, indeed they aren't) is that it takes quite a bit of planning and design to get something like that set up, and Amazon and eBay couldn't take the time. You have to be fast if you want to "build a brand"! Plus, to a greater or lesser extent, Google runs a single algorithm. Amazon runs a thousand of 'em, sometimes 4 or 5 a page.
"Beware by whom you are called sane."
Re:Seen it (Score:2)
I think there's a great Ask Slashdot lurking in here about how they built and manage this stuff.
Re:Damn, thats a lot of space (Score:2)
good citizens? (Score:2)
* google uses redhat
* they customise it extensively
* they have arrived at workable solutions to problems of massive parallelism in several fields, eg load-balancing, tcp/ip optimisation, efficient segmentation of a huge database and the associated routing of queries, and presumably heat dissipation too.
* in short, they have rolled their own into a system that even the
* they make enough money to run 8000 pizza boxes and buy state of the art furniture by selling this combination of technologies to corporations who want to improve the efficiency of their knowledge workers.
* they have contributed a total of, say, $3000 to redhat over the counter at Fry's.
Now I'm not sure that counts as good oss citizenship.
Overall i'm inclined to think that they're in credit just because google is so fscking good that it has replaced my bookmark file. I'd say that their public service, esp given the
It's hard to call, especially as i am a user of rather than contributor to linux and therefore benefit without being made use of, so i'm surprised not to see it being debated here. Just _using_ linux really doesn't deserve accolades any more. As they say in the article, it's an economic and practical decision, not an ideological one.
Damn, thats a lot of space (Score:2)
And I thought some of the SAN setups here looked impressive.
Re:Seen it (Score:2)
colocation ??!! (Score:2)
I've always understood that you place half yoru servers on the west coast and half on the east. should there be a net split i.e. contruction worker who didnt' call before he digged. you wont suffer he conciquecies. with all their servers in DC, how will they prepare for this
Re:No Kudos to Google (Score:2)
Did you actually read the article? Because the guy in charge of this stuff said that they were saving money by doing it this way. Considering the amount of money Google would be out if he were just lying through his teeth as part of the Linux Zealot Conspiracy (c), I really doubt that he's making that up. But if you'd like to point out all of the Google-sized sites that you're running, maybe we could talk.
He also mentioned that using a freely-modifiable commodity OS on commodity hardware kept them free of any vendor pressure, which I imagine would be somewhat of a problem with Solaris, et al. No forced upgrades for Google!
P.S. There is no Linux Zealot Conspiracy, of course, but you wouldn't know it by reading /. :P
Caution: contents may be quarrelsome and meticulous!
Re: Multithreaded TCP/IP stack (Score:2)
About serializing: sure. Bot you can also tell that to the Java guys (in Java-ese, "serializing" means "transforming an object's internal status into a bytestream that can be transferred over the network to some peer where, given the object's class code and the serialized data, an identical instance of the object can be created").
Re:Electric bill (Score:2)
New York State is not at all homogenous. NY City and Long Island have horrendously high rates, while central and western NY are quite low.
There is always a political tug of war regarding distribution of cheap hydro power from the St. Lawrence to the rest of the state, but you could always count on upstate being relatively cheap.
When I moved from upstate NY to NJ my power rates tripled.
Re:A Real Reason They Can Get Away With That (Score:2)
--JRZ
Re:Electric bill (Score:2)
You know, you raise a funny point. When relocating our company, we looked at the cost of bandwidth and electricity, knowing that it was a cost of business. But when you've got 8,000 servers, you've got to think that electricity becomes a huge issue in picking your location. You almost want to move further up North, just to cut your air conditioning bills.
Re:slashdot runs off 8 boxes? (Score:2)
Newsflash: Full Usenet archive available (Score:2)
Linux kernel patch (Score:2)
Re:A Real Reason They Can Get Away With That (Score:2)
-Restil
Re:And this is good? (Score:2)
Re:No Kudos to Google (Score:2)
for server in $serverlist do
scp patchNNN.tar.gz $server
ssh $server (gunzip patchNNN.tar.gz; tar xf patchNNN.tar; install-patchNNN.sh)
done
It's not that hard to automate such a thing. Those 8000 servers are NOT managed individually -- that gets to be a real big pain, real fast.
Re:Kudos to Google (Score:2)
----------------------
it's still not as 31337... (Score:2)
Re:it's still not as 31337... (Score:2)
Re:im not really clear on.. (Score:2)
Re:Have you ever noticed.. (Score:2)
Re:it's still not as 31337... (Score:2)
although I agree it's not terribly witty, but i found it slightly amusing.
Re:Where can I get the google source? (Score:2)
found under http://www.aspseek.org
It is a deep crawler that works well, I did
compile the actual stable Version under SUSE 7.0
and get it running together with MySQL.
ASPseek is not google but I would say that
it imitates google a little bit. You can
give it a try. I guess you do not need
4 PCs. Crawling/searching on my Celeron333
Server with 160 MB RAM and IDE HD did
not stressed the machine. I dont know
what happens if you got lots of pages.
ASPseek people say that their baby got
4 million pages indexed.
Re:Crud.... (Score:2)
The New Pentium XXI running on a -.0001mu core.
Justin Dubs
Re:a petabyte?!!?! (Score:2)
----
Re:Kudos to Google (Score:2)
I really like to hear that companies that do so much for so little are doing well, such as google, or trolltech. I just worry for their actual business and the talented developers they employ...
I guess they're doing OK if they added 4000 machines...
--
Pictures! (Score:3)
Re:Kudos to Google (Score:3)
I have to say it's so nice not having a giant animated "Punch the monkey for $20" at the top of the screen. With Google, you actually have to look for the ads to see if there are any. It would be nice if a few other major sites learned something from this. What would that lesson be? Giant flashing ads only annoy people and do not bring in new customers.
Re:Why? (Score:3)
Bill - aka taniwha
--
google modifications available (Score:3)
I'm curious whether or not the optimizations made by google are readily available to the public. i.e GNU,
"Google downloads Red Hat for free" (Score:3)
"Google downloads Red Hat for free, taking advantage of the company's open source distribution. And Linux's open source nature allowed Google to make extensive modifications to the OS to meet its own needs, for remote management, security and to boost performance."
I'm sure Red Hat is upset that they are missing out on the sale of 8000+ Linux licenses!!
Re:Electric bill (Score:3)
Re:Kudos to Google (Score:3)
Re:What about hardware maintenance (Score:3)
Really doubled or part of a cost cutting move? (Score:3)
I wonder if they really need that many servers or they doubled their size in order to have a seemless transistion during the move? I.e. Get the new site up and running and handling load and then take down the old site? Maybe they will sell off the old computers instead of move them. This could just be a PR spin to say "we doubled our size." Just devil's advocates conjecture, but they are probably moving to DC from SF to save money on space - so this is more of a cost cutting thing than anything else.
Don't get me wrong, I love Google and use it everyday, but I don't see any reason they would suddenly double their capacity.
MG (Managing Gigabytes) (Score:3)
Re:it's still not as 31337... (Score:3)
http://www.elj.com/elj-quotes/elj-quotes-1999.htm
Re:a petabyte?!!?! (Score:3)
Re:im not really clear on.. (Score:3)
Re:Why not Windows 2000? (Score:3)
Re:a petabyte?!!?! (Score:3)
> can you just imaging how much _______ (insert your choice: mp3s, pr0n, divX;), etc) you could store! damn. *drool*
A full USENET feed (including binaries) is about 250GB per day (yes, about an OC-3 saturated), and growing at 50-60% per year.
One petabyte works out to only four more years of future USENET, give or take 50%.
Scary, ain't it?
Compression (Score:3)
Re:im not really clear on.. (Score:3)
Yeah 1.3 billion pages indexed is stunning. But even more stunning is the fact the total number of "pages" (an overly broad terms I concede) on the Internet is at least 100, if not 500 times [brightplanet.com] that size. Basically Google is behind on indexing by 2 to 3 orders of magnitude.
It's true that they constantly refresh their index. But it takes them about 2 months to do it. That ain't fast no matter how you look at it. As evidence, take a look at the date on the cached CNN.com home page [google.com]
Electric bill (Score:3)
I'd just give up and get a handful of S/390s and do the same thing.
DanH
Cav Pilot's Reference Page [cavalrypilot.com]
Re:Argghh (Score:3)
Free software makes all kinds of sense when users demand it, especially when it comes to operating systems, programming languages, and "productivity" applications. But it makes zero sense for a company who has not only written the software, but has the only machine running that software, to give away the software.
Re:But..how do they finance? (Score:3)
Also, do a search for "porn". Ads.
--
And this is good? (Score:3)
That said, I'm surprised by the positive slant on this story. 8000 boxes that have to be separately administered? This is cost-effective (and environmentally sound) compared to a small number of heavy-hitter Solaris, AIX or Tru64 systems? I have to say I was a lot more impressed by hearing what cdrom.com does with a single FreeBSD system than by how many Linux boxes Google has had to cobble together.
I've got to wonder - if this were a story about 8000 W2K servers powering Hotmail, would it get the same spin?
Unsettling MOTD at my ISP.
Why? (Score:3)
You can always go with Tru64, W2K Datacenter, AIX, et al.
It would be interesting to figure out how much high-powered hardware would be required to replace those 8,000 boxen and the software to run it, and see if it comes out less or more than running the 8k separate Linux boxes.
-------
-- russ
"You want people to think logically? ACK! Turn in your UID, you traitor!"
Google architecture (Score:3)
http://www-db.stanford.edu/~backrub/google.html
Note: the document was written in 1998.
two snipets:
6.3 Scalable Architecture
Aside from the quality of search, Google is designed to scale. It must be efficient in both space and time, and constant factors are very important when dealing with the entire Web. In implementing Google, we have seen bottlenecks in CPU, memory access, memory capacity, disk seeks, disk throughput, disk capacity, and network IO. Google has evolved to overcome a number of these bottlenecks during various operations. Google's major data structures make efficient use of available storage space. Furthermore, the crawling, indexing, and sorting operations are efficient enough to be able to build an index of a substantial portion of the web -- 24 million pages, in less than one week. We expect to be able to build an index of 100 million pages in less than a month.
9.1 Scalability of Google
We have designed Google to be scalable in the near term to a goal of 100 million web pages. We have just received disk and machines to handle roughly that amount. All of the time consuming parts of the system are parallelize and roughly linear time. These include things like the crawlers, indexers, and sorters. We also think that most of the data structures will deal gracefully with the expansion. However, at 100 million web pages we will be very close up against all sorts of operating system limits in the common operating systems (currently we run on both Solaris and Linux). These include things like addressable memory, number of open file descriptors, network sockets and bandwidth, and many others. We believe expanding to a lot more than 100 million pages would greatly increase the complexity of our system.
Re:Further info on box specs? (Score:3)
Evidently, they shun multiprocessor boxes, use big & fast IDE drives (2 per PC, one on each IDE channel), and from last year's article [internetweek.com], use 100 Mbps links on the racks, with gigabit links between the racks. Last year's articles also quotes "256 megabytes of memory and 80 gigabytes of storage", though I imagine it's closer to 512MB (at least) and 180 GB per server now. Also says that they pack them in 1U on each side of a rack.
But, here's the kicker, "Many of the systems are based on Intel Celeron processors, the same chips in cheap consumer PCs."!
Re:a petabyte?!!?! (Score:3)
If 50% of their servers fail, then they would be slow, but still work fine.
If 90 percent of their servers failed, they would still have 800 up. It would be very slow, but might still handle the load.
If you had 1000 servers with disk array and your system failed, then ouch!
In the other hand, they probably have half a dozen burned CDs of their implementation of Linux (depending on the HW configuration), so if a server fails, they take it offline, put another on there, load the OS already preconfigured from the CD (with all conf and stuff done already) and load it online.
One tech can probably put 10 servers online a day.
So 30 techs can probably put up 300 servers a day.
Assuming each Linux box operates without admin intervention for 90 days, there would be 88 boxes that need to be fixed each day (about 1%), and so 9 techs could handle it.
They probably have more than that.
And since the technology is not hard to understand because it's a dual pentium PC, they don't have to call the IBM mainframe guy over. Also, they probably have a few dozen servers already configured, ready to be popped into the rack.
Why not Windows 2000? (Score:3)
Re:Seen it (Score:4)
Re:Why do you think Google needs 8000 servers? (Score:4)
... 8000 Msft boxen is probably getting to the point where you'd need 3 shifts of McSE's full time just to reboot the damn things - kinda like the days they made computers with so many vacuum tubes that their failure rate caught up with them, and it would barely run before another tube needed replacing.
I'd bet they've already done the math (Score:4)
In any case, they'd have done it at some point along the line before the 8000th server arrived, and if they found they were making a mistake I can't see why they wouldn't have switched by now. Especially since if they thought NT would somehow be so much better they could have just removed Linux and installed NT and not have had to buy more hardware.
Sounds like Linux is working out pretty well for them.
Re: Multithreaded TCP/IP stack (Score:4)
Let's recap how a single packet is to be handled (and probably I forgot something):
you get the ethernet interrupt, you have to DMA the frame off the board, check to what protocols it belongs (if it's not IP, drop), checksum, check if you have to do any reassembly, check what protocol it is (it might not be TCP after all), check that the packet makes sense given the connection's history (i.e. sequence numbers and various other bits here and there), identify the process waiting for the packet, copy to userspace, signal process.
A multithreaded TCP/IP stack means that more than one packet can be in the pipeline at the same time. It makes no difference on an UP system really, but on Nproc it can multiply your throughput by N (at least theoretically), just as a multithreaded app could increase throughput on a multiproc system.
Of course, to be feasible, as many parts of the stack as possible must be reentrant, or you'll have to do locking and thus (in MS-ese) "serialize".
Seen it (Score:4)
can you say pr0n? (Score:4)
Ok, new poll
What do you think is stored at Google?
1. Huge search engine index
2. Pr0n
3. MP3s
4. DivX ; ) Movies
5. DivX ; ) Pr0n
6. Marketing data collected with satellites and video cameras attached to flies... just like MLB
7. Cowboyneal's transporter pattern buffer
note: I own _MOST_ of the mp3's and divx movies I have...
ROI on Linux (Score:4)
The power drain is staggering! (Score:4)
What about hardware maintenance (Score:4)
I just gotta wonder at what point they would get better overall efficiency by replacing all those little boxes with a couple of big iron mainframes.
Interesting points (Score:4)
- Number of websites are increasing exponentially. So your number of computers or required CPU cycles are increasing exponentially. On the other hand prices per CPU Mhz also decreases exponentially (Moore's law ???). That is the key solution for the scalabbility. At least the problem is not exponential.
- As mentioned in this article, they have been running Celeron 500+256MB RAM+ 2x 40GB harddisks back then. When a computer fails it is easier to replace them because of the cheap hardware.
- Buy systems as much parts integrated to the main board as possible (NIC card, etc.) It is supposedly more reliable.
- They are not running linux because it is cheaper. I have seen headlines about this including Slashdot, but it is not true. They are not denying that they saved a lot of money because of that, but hen they started Google that wasn't the issue. He mentioned that they could have had got a good deal from Sun for Solaris. The reason was that the openness of the source code and other reasons mentioned in the article. By the way he mentioned that TCP stack issues were also considered when the decision have been made. it looks like they are confident that they can fix problems at home if any exist.
Google wants to design all software they run at Google. They don't want to use third party software because it introduces instability and it is difficult to fix bugs in that case.
- They are not running Apache. using linux doesn't mean running apache. They designed their web server, which is simplest possible and therefore fastest. They don't need a complicated web server. All the computation is done in the background on 8000 linux servers. Web server needs only to send the query to the query server and display the results.
- Googles job was easier than people might think. Their database is not dynamic. It only gets updated once a month. Updating means replacing the old files with the new ones, which is an offline process. Comparing this with an ecommerce site displaying real time statistics, you can see that google has an advantage and makes things easier for them.
- Lets say Spidering and crawling is done on one datacenter. You need to copy these terabytes of data over to other datacenters and then replicate it to multiple server farms in each datacenter. You have to do this fast and without any errors. You don't want to use OS file system functions.
- They rent bandwidth of multi gigabits for offline hours when there is not much traffic. of course for a very very cheap price. They use this bandwidth to copy data files from west coast to east coast. We are talking about many terabytes.
Crud.... (Score:4)
They are still NOWHERE near a Googol Servers like their name suggests... Humph...
--Volrath50
Re:Loadbalancing large websites (Score:4)
http://slashdot.org/article.pl?sid=01/04/26/033921 9 [slashdot.org]
Anandtech.com [anandtech.com] is using it.
--
Amazing (Score:5)
Re:Loadbalancing large websites (Score:5)
Re:Locking into a OS (Score:5)
Totally not the case - they've made their OS what they want, and they can change it if they want to. Don't confuse the cost of rolling out changes to 8000 machines with the cost of forcing a proprietary OS vendor to make the changes you need - you can roll out 8000 machines on a rolling basis in a week, assuming a conservative 1 hour automatic install 80 at a time (1% unavailability). You may never be able to get Sun or Microsoft to make the changes you need in an OS, if it isn't in their best interest to do so. Google's only "locked in" to RH in the sense that they can only achieve sufficient flexibility with an open source OS, and it sounds like they just went with RH because it's easier to hire admins. I bet they could run on any other flavor of Linux pretty easily, and *BSD without too much pain if they had to.
Moderators, the above was only insightful if you don't care to think very hard...
Caution: contents may be quarrelsome and meticulous!
Loadbalancing large websites (Score:5)
My company is in the progress of moving from one big server to several smaller onces, to allow for greater scalability, there is just a limit to how much cpu + memory you can put in a single box. Our future site will proberly use linux virtual server, which seems quite nice, however I havn't seen that many testimonies/reviews from sites that use it. The company I work for creates online image manipulating services, and part of the process is rendering large high quality images - and the hard part seems to be shared storage of these images (scsi over tcp/ip seems very interresting), load balancing with static pages seems easy enough. Anyway google's way of using many small machines is an inspiration.
Do they give back? (Score:5)
The question that should be asked here is if they are sharing the results of their word. I bet that they're probably lifting some of their techniques hot and fresh off of research papers and they may be the first to actually use them in a enterprise environment.
Note that I personally believe that closed source is not necessarily a bad thing. But if Google has made radical changes to these enterprise-grade tools, it would be nice to see them trickle down into the mainstream distros. While we as home users would probably never need them, it would certainly put to rest some of the pro-Microsoft arguments against Linux as a server-grade OS.
Of course, for all I know, they could be actively working with Cox et al to incorporate their findings into the kernel and related tools.
Either way, a very impressive job done with a operating system that "is simply a fad that has been generated by the media and is destined to fall by the wayside in time." [microsoft.com]
Note that I use Windows and Linux so I'm no bigot... (some of my best friends as Microsoft Programmers!)
Re:A Real Reason They Can Get Away With That (Score:5)
Exactly. Even at ~1M/s per IDE drive (lots of random reads), that's 1M/s * 8000 machines * 2 drives/machine (yeah, some have 4, but the article doesn't say how many) = 16GB/sec. It would take a hell of a SCSI setup to equal that bandwidth, let alone the massive numbers of IOs.
Further, even if the boxen only have 2G memory each, that's 16TB of memory, which you could put in one big server, but no single memory system is going to provide the throughput that 8000 SDRAM channels will.
Re:Why? (Score:5)
Re:im not really clear on.. (Score:5)
Well, when was the last time you searched on Google? It has a stunning amount of servers indexed. I can search for just about anything, and Google always finds more accurate hits, faster, than any other search engine. (Don't turn this into a search engine flame war, either.) They have to constantly refresh their indexes, and they have to turn around fast answers.
Yahoo even uses them for their search engine. I can't imagine being able to service Yahoo's search needs with anything less than a full-fledged data center split across two cities.
Kudos to Google (Score:5)
This is only tangentially related to the story at hand, but I would just like to compliment Google on a job done extremely well. They have successfully built the fastest search engine out there, using open methodologies and without whoring themselves out like any number of other search engines. They continue to add interesting (and [gasp!] useful) features such searching PDF documents and their translation engine. They have really helped the Open Directory Project along, as well.
There are successful .coms out there, but I think their business practices are so foreign to the "regular" business community that they aren't quite sure how to handle it.
BTW: Anyone else see a philosophical relationship between Google and ArsDigita?
Petabyte? Try pedobyte! :) (Score:5)
And what takes up all that size? You know it--pr0n. The storage size says it all...it's not a petabyte they've got there, but a pedobyte. Sick google bastards. :)
Re:Kudos to Google (Score:5)
Actually, I think they're being smart about it.
If the typical query returns one USENET post - maybe 2-3 kilobytes of text - why would you want to (as Deja did) spend money sending 20-30 kilobytes of HTML for the associated frames and banners and other ad support?
The user's gonna see one ad. Google's bandwidth and I/O costs are gonna explode if the HTML wrapped around each ad takes up 10 times as much space as each query's results.
By going with text-based ads and a non-frames approach, they not only make the site more user-friendly (thereby adding value), they cut their own costs by a sizable fraction.
With lower bandwidth costs and I/O requirements, Google can make money with less ads, not more. That's where (IMHO) Deja went wrong - the more they needed the ad-revenue, the more they escalated the cost of serving the ads, in a vicious circle that consumed them.
It's also where (IMHO) Google is doing it right.
Seen it firsthand... (Score:5)
Wait, I have the Answer (Score:5)
Then use the turbines to drive generators.
Then send the power from those generators to the western united states.
Now -- follow me here -- this would be a self-sustaining system, no?
Users use google to search the web and read their embarrassing usenet posts from 1995. Power is generated. That power is funneled back to the user so that his or her computer stays on, the lights stay on, and they don't have to worry about getting stuck in an elevator during a rolling blackout.
Users are happy, nuclear opponents don't have to worry about radioactive leaks into the environment from improperly sealed cooling tanks and leaking water, and google remains up and active, chugging away ad infinitum.
Simple.
Tomorrow, I'll work on my plan for cold fusion. Maybe a couple of Guiness glasses filled with tapwater, a couple of batteries, and a beowulf cluster
Re:Wait, I have the Answer (Score:5)
"Lisa! In this house we obey the laws of thermodynamics" -- Homer Simpson
Re:And this is good? (Score:5)
Why would 8,000 identical boxes be difficult to administer? The guys that develop the monitoring software and the install and upgrade processes are probably pretty smart cookies. But the actual maintence of the machines could probably be handled by monkeys.
Think about it: the instructions for handling a hardware failure in one of these machines is probably:
Another box in the closet is probably labeled "Empty, pre-labeled Fed-Ex shipping boxes that are exactly the right size for our rack mounted hardware. Use to ship any badly broken machines back to our system engineer. Call first!"
A Real Reason They Can Get Away With That (Score:5)
Now imagine an e-commerce site built like that. Loss of any part of user list or merchandise catalog is a major failure. This is why such sites are usually powered by a moderate (typical site) to huge (Amazon, eBay) database with an enormous redundancy built in.
And no, Linux on IBM/390 WILL NOT help them because it is just an emulation, and disk arrays of this one huge computer will get swamped by the billions of read requests (the same way they will get swamped on Starfire or the same S390 under OS390). The entire idea of the setup is that you have a lot of independent disk channels.
Another interesting insight is that they have done some improvement to administering all of these machines remotely. Otherwise they will blow all their money on paying sysadmins
missing email (Score:5)
three possible explanations:
Multi-Threading Madness (Score:5)
google's new language features (Score:5)
-Stype
Interesting detail the article didn't go into: (Score:5)
Where does Google get their money? (Score:5)
The Google site features minimal advertising. So they are most likely funded with VC money. This means that they must have a plan for making money at some point. What is it and when will it kick in?
Re:"Google downloads Red Hat for free" (Score:5)
I imagine they only download it once, then distribute via LAN. Besides, from last year's coverage, "Google actually paid for only about 50 copies of Red Hat, and those purchases were more of a goodwill gesture. "I feel like I should be nice, so when I go to Fry's I pick up a copy," Brin said."
Re:Multi-Threading Madness (Score:5)
Ads are secondary... (Score:5)
Ironic timing... (Score:5)
Re:Doesn't this seem wrong to anyone? (Score:5)
They're using 8,000 computers to accomplish a pretty amazing feat, and they're doing this instead of buying a pretty huge farm of larger and faster computers anyway. Sometimes more smaller parts are better - you don't have one big machine that fails, separate parts are replaceable (say 10 or 20 machines instead of a few larger servers).
You don't build a house starting with a large block of concrete - you use bricks. Google is doing the same thing. Cut them some slack.
They're efficient too (Score:5)
And these high powered fans then blow the blisteringly hot air along a complex series of ducts which lead to facilities which:
a) generate electricity for the wall-o-lava-lamps [google.com]
b) are used to fill state-of-the-art, floating, hot-air furniture [google.com]
c) keep folks warm-n-toasty in the sauna [google.com]
d) make you hot [google.com] and thirsty [google.com]
--