What's It Like to be Google's Boss Techie? 671
We'd like to welcome Google Director of
Technology Craig Silverstein as our next Slashdot
interview victim... err... guest. You think you
run a big Linux server farm? Craig's is bigger.
Think your Web site gets a lot of traffic and
creates a lot of headaches? Just think what Craig
must face! Post whatever you'd like to ask Craig
below, one question per post. About 24 hours after
this runs we'll email Craig 10 of the
highest-moderated questions, and we'll post his
answers shortly after he gets them back to us.
I've wondered (Score:5, Interesting)
I'm curious -- what drives the innovation? Is it the hardware team advancing architecture to permit the software team more room to play, or is it the software team saying, "Hey, look what we got!" and the hardware team dropping the iron to implement it?
I understand there must be some level of synergy, but is it completely seamless or is one side of the equation effectively driving the other?
Leem
Network Management Tools/Technologies (Score:5, Interesting)
What technologies help to support the Google server farm? What kind of automated monitoring and trouble reporting tools are in use? Are they home brew, open-source, or COTS with some customization (scripts, etc)? And if you had to point to one area of network management and say "we could use some improvement or some better tools", what would that area be?
BTW - Google Rocks! I never use anything else anymore!
That's a good question, My question is: (Score:4, Interesting)
p.s.
Thanks for all your help with my school research
Weighting of heuristics (Score:5, Interesting)
Can you talk a bit about how those weights have changed over time? Have there been any surprising shifts?
Corporate Culture (Score:4, Interesting)
Simple question (Score:4, Interesting)
(I've heard thousands of PC's with everything in RAM, but I'd love to hear it from the horses mouth)
Re:Simple question (Score:3, Interesting)
How much bandwith you guys use/have?
Re:Simple question (Score:4, Interesting)
How much bandwith you guys use/have?
Addaddendum:
How much of that BW is actually used and how does it vary during with the time of day and day of week?
What's the ratio of traffic to and from user queries compared with the traffic searching the web and do you expect to scale indefinitely as growth continues on the current path?
Statistics (Score:3, Interesting)
Re:Statistics (Score:5, Informative)
Favoring Big Guys (Score:5, Interesting)
Googlebombing (Score:3, Insightful)
Re:Favoring Big Guys (Score:5, Insightful)
So how do you find those treasure troves? And how do you decide which ones are treasure troves and which ones are the millions of "all about me" web pages? Or do you care?
Windependence Day (Score:3)
I'm not sure when the change took place (Score:5, Interesting)
Pigeon Computing (Score:4, Funny)
technetcast.ddj.com/ (Score:3, Informative)
The Technology Behind Google 2000-10-19 (1hr 13min) By Jim Reese, Chief Operations Engineer, Google. How to build an internet search engine that indexes 1-2 terabytes of data 200 million web pages- and serves it up at a rate of 1000 requests/second. (Hint: Start with a farm of 10,000+ Linux servers). The technology behind Google: company overview, search parameters and results, hardware and query load balancing, Linux cluster topology, scalability, fault tolerance, and more. [420]
http://technetcast.ddj.com/tnc_search.html?key=
Why did you chose Linux? (Score:5, Interesting)
And the Sergey says.... (Score:4, Informative)
From one interview...
Jason: What led to Google's decision to use Linux? When did that start?
Sergey: Well, Larry Page and I were in the Stanford PhD program in Computer Science. And we developed Google there. The way the computer science program worked is there was a hodgepodge of computer equipment lying around, and we would grab whatever scraps we could. We had all kinds of computers: HPs, Suns, Alphas and Intel's running Linux. So, we gained a lot of experience with all of those platforms.
When we started Google, we had to make the decision of what we wanted to use. Of course we chose Linux, because it is the most cost effective solution.
PCs are not only much cheaper these days, but we can also get them very quickly, because they're such a commodity item. That's an incredible benefit. We just installed another 1,000 computers and we got that done in a few weeks. That's really hard to do with any other kind of workstation. I think that's an advantage that people don't entirely realize.
Jason: Did you view it as being better, or was cost the main reason?
Sergey: It was better in some ways. Certainly for our purposes, we felt the support was better. For example, the actual kernel authors will respond to problems pretty quickly. They are especially responsive to Google nowadays, since we're so widely used. We can have a 15 minute turnaround. You can't really beat that for support.
That was an important factor, but frankly, the cost was a bigger issue. PCs are so cheap, which is very important. Sun's Solaris is probably more stable than Linux on PCs. It's hard to determine the blame, whether it's the hardware or the operating system. But, it's a minor difference.
Jason: Then, does all of your support come from newsgroups or do you actually pay for it through Red Hat?
Sergey: We have an operations team of about ten people, which helps a lot. And other than that we check newsgroups and e-mail the authors of the code. Usually, if it's a problem we can't figure out, we go straight to the authors.
Jason: Is Linux used on desktops at Google?
Sergey: It depends. Engineering mostly runs Linux. Business development/marketing runs Windows. Actually, I use Linux with VMWare running Windows. Some people have two computers, particularly some people in engineering who do UI development and need to test things out on Windows platforms. I find it better to just use VmWare and have one computer.
Jason: In a technical sense, what does Linux lack? What does it not provide?
Sergey: The 64-bit file system, which I know they are working on. It's slowly coming around. I think there are still occasionally some stability issues. I'm not saying Linux is unique in that respect, but you definitely want to have reliability. There are some issues dealing with higher memory systems. If you get to 2GB, and you try to push it past that, we encounter various problems. I know we've had some trouble with the network stack when we really push it hard. In terms of having lost most connections from lots of different machines.
And from another...
How is Linux used at the Google Projects? Why was Linux choose to improve Google search engine?
Sergey Brin: Actually, we currently run over 6,000 RedHat servers.
Linux is used everywhere...on the 6,000+ servers themselves, as well as desktop machines for all of our technical employees. We chose Linux because if offers us the price for performance ratio. It's so nice to be able to customize any part of the operating system that we like, at anytime. We have a large degree of in-house Linux expertise, too.
Most of our administrative tools were developed in-house, as well.
Regression (Score:5, Interesting)
Do you think Google has become an Internet point of failure? With the competition for larger and larger indexes, is the Internet becoming centralized? Do you think this is a bad thing?
Search engine spammers (Score:5, Interesting)
Stumped (Score:5, Interesting)
Re:Stumped (Score:5, Funny)
Scientology (Score:4, Interesting)
I know, I know, Only one question but - it begs to be asked - how well is your technology going to be able to scale? Considering the near-expotential growth of the internet will PageRank be able to keep up?
Storage used (Score:5, Interesting)
how about what browser (Score:3, Interesting)
after all thewayback machine does kind of the same thing
its software
so this is my question
what browser do you use ?
regards
john '1.1alpha' jones
I'm curious... (Score:5, Interesting)
Question 1 of 2: Language of choice? (Score:4, Interesting)
question for Craig re: search languages (Score:3, Interesting)
Creative Ideas (Score:5, Interesting)
peer pressure (Score:5, Interesting)
Definitely need these in there: (Score:5, Funny)
Turn-offs?
The worst date you ever had?
Parent company with lots of bandwidth. (Score:3, Funny)
"So, interested in buying a nerdy weblog site, only slightly soiled?"
Dot com changes? (Score:5, Interesting)
Did google keep the atmosphere as you've grown? did they keep it while others tanked?
Specs (Score:3, Interesting)
Go on, make us jealous
so... (Score:3, Interesting)
Academic ties (Score:4, Interesting)
Re:Academic ties (Score:3, Interesting)
Approximate number of employees: 400
Ph.D.s on staff: 50+
Languages spoken: 34
Number of roller hockey players: 32
Number of offices worldwide: 12
Massage Therapists: 2
Neurosurgeons: 1
Question 2 of 2: Browser Stats (Score:3, Interesting)
Re:Question 2 of 2: Browser Stats (Score:3, Informative)
http://www.google.com/press/zeitgeist.html
Browser stats lacking at Zeitgeist (Score:3, Insightful)
Interestingly, I see "Other" has been steadily rising since it bottomed out in January, and has now surpassed Netscape 4. I would love to be able to click on that chart and see a detailed list of the percentages, and what "other" is composed of. Hopefully we'll see Mozilla get its own line on the graph soon.
It would also be nice to see a breakdown on a per-OS basis. I wonder how many people are running Internet Explorer on Linux? (Seriously, that would indicate what portion of non-IE users hack the browser tag to make web sites happy.)
Linguistics and Searching (Score:5, Interesting)
CO$ and Deep Linking (Score:5, Interesting)
How have these affected you and your job, and what are you feelings on this subject?
Logo work? (Score:3, Interesting)
My hats off to that team!
Re:Logo work? (Score:4, Informative)
If you want to know more about the special logos (referred to as "Google doodles"), as well as see an archive of the Google doodles over the years, go here [google.com].
Nathan
Google and IP address. (Score:5, Interesting)
Question - Google's first programming contest (Score:5, Interesting)
As a market leader... (Score:5, Interesting)
I can't imagine that you haven't. It must have been a huge decision to invest in one technology, so are you satisfied with what you have?
Does size really matter? (Score:4, Interesting)
'Web Indexing companies" (Score:5, Interesting)
[forget the ethical debate about that
What I want to know, is - going fowards - as more and more of these companies start up, and discover more and more unscrupulious ways of 'loading' the search engines with bogus hits/visits/data/etc.
1) Not loosing ad $$ to these folks
and
2) prefenting every search from returning something like www.hotgrannysex.com or www.top50.com as the 1st (or first 15) results for a search on
Forget Craig (Score:4, Funny)
Just so I'm not off-topic:
Mr. Silverstein, how does Cindy look in tight sweaters?
Drool...
Talisman
Opinions on being open (Score:4, Interesting)
Thanks,
-Dave
The future of Google (Score:5, Interesting)
I think Google absolutely rocks. It has by far the most intelligent/helpful search engine results. Thanks for the great service.
Now onto the questions- what is the Google vision / strategy for the future? Where can Google go? From a search engine perspective, what are some of the challenges that you have and improvements that can be made (perhaps speeding up crawling to make the latest content available, for example)? How are you going about solving these challenges, and when can we expect them to be implemented?
On a similar note, I've noticed that recently Google announced a "google box" that allows for corporate to take advantage of the google search algorithms and indexing. Any more products like this being planned?
Attacks? (Score:4, Interesting)
Can Google last? (Score:5, Interesting)
Google is a great free public resource. My concern is that it has to be expensive running a resource like that. I know Google's strategy is somewhat to use the free resource as a loss leader to promote your search technology, but the key word in "loss leader" is "loss". It's a great theory as long as you are able find people who want and need your search technology.
So my bottom line question is this: Does the web site pay for itself via the advertising? Is there a possibility that someday Google may decide the web site costs too much money to run if you get to a point where your reputation no longer needs the loss leader?
so what does it look like? (Score:5, Interesting)
It would be great if you did a documentary feature with TechTv or someone, because its one thing to read about your facility, but it would be another to see it.
Thanks for all of the help I've gotten from Google.com, I don't think I'd still be in schol without it.
Paradesign
PS, even just a photo feature on the site would be nice.
Google cache (Score:5, Interesting)
There's one problem that everyone has with the cache, however - you don't deep-nest the caching, so that following any links on a cached page will lead to the original (probably broken) site, instead of to another cached page. Is there a technical or legal reason for why it works this way? Any chance we'll see deep caching at some point?
Re:Google cache (Score:3, Informative)
Google's inescapable coolness. (Score:5, Insightful)
Possible to have too much power (Score:5, Interesting)
Not to be too "X-File'ish", but does there come a point where too much knowledge is captured in Google? A point where anything that doesn't exist in Google doesn't exist, period? Wouldn't that represent a very tempting target for a bin Laden or a John Ashcroft, to try to control how the modern world thinks?
Kind of far out there, I know, but do you guys worry about this kind of thing?
sPh
Re:Possible to have too much power (Score:3, Insightful)
Slashdot effect? (Score:5, Interesting)
mod_google (Score:5, Interesting)
When things get ugly (Score:5, Interesting)
Dealing with DoS (Score:5, Interesting)
The rest of us just suck it up with fat network pipes, but a high-profile target like google would be the holy grail of Internet vandals.
Has anyone ever poisoned your DNSes, effectively taking Google down even though the server are up? Successfully inserted bogus WAN routing info into the Internet, again effectively bringing down Google even though the servers are fine?
What's your worst cracker/net vandal story?
Key Ingredients To Success? (Score:4, Interesting)
To what do you credit the popularity of google? Do you consider google a "success," or are you holding out for thousands of employees and billions in cash flow?
What does Craig do for fun? (Score:3, Interesting)
Personally I'm usually pretty drained after a fun day staring at the screen and typing like a monkey, and sometimes completely avoid the PC when I get home, prefering to chill with a decent book (currently Cradle to Cradle [slashdot.org]), zone-out in front of the TV, or go cycling in the beautiful Isle of Man [isleofman.com] (watch "Waking Ned Devine" for an idea of the scenery - jealous?<grin/>).
So I guess my completely-non-tech question is:
What do you do in "loafing" time (ie. loaf - To pass time at leisure; idle.), when you've left the office, "lost" the pager/Blackberry/PDA/mobile etc., and got away from it all?
Cheers,The Web's full potential (Score:5, Interesting)
Do you expect widespread usage of RDF [w3.org]/DAML [daml.org]/OWL [w3.org]/TopicMaps [topicmaps.org] for explicit meta-data annotation of web resources, or will it be used only in small circles of specialized content providers like academia, or maybe not at all?
How will Google react? Do you plan to use meta-data provided by web resources if found, and how will you decide if it isn't just made up to get people on some bogus pr0n site (like with those <meta>-Tags today)? Will it someday render the brute-force approach of full-text-indexing obsolete?
Why not Google as a Non-Profit? (Score:5, Interesting)
Google has become such an important part of the Internet for millions of average users. With this in mind, my friends and I often joke about what would happen if (knock on wood) Google were to go out of business. I suggest that ICANN should do something useful for a change, and fund Google as an official, non-profit project for searching the net.
Although I have heard that Google turns a good profit, what exactly is preventing Google from becoming a not-for-profit organization? Couldn't Google take the extra income from licensing its search to create better search technologies and pay the employees, rather than make some shareholders rich? Wouldn't this perhaps make Google a more sustainable organization?
Google cache and copyright (Score:5, Interesting)
Staying on Top... (Score:5, Interesting)
Newsgroups (Score:5, Funny)
Internal Admin Utilities? (Score:4, Interesting)
How do you handle user accounts? Event notification?
Do you guys use "enterprise" software like Tivoli or Openview, or did you roll your own solution?
Re:Internal Admin Utilities? (Score:4, Interesting)
How do you balance the need to keep systems up-to-date against your (doubtless demanding) availability requirements? Is there enough redundancy in there that you just flip a machine out while it is updated? Presumably, however the machine is upgraded, this is automatic - you must have too many machines to do it any other way!
How to you test these updates (security patches, distribution updates, regular changes to your own software, configuration tweaks)? Do you have some kind of enormous test environment containing a copy of 50% of the main Google cache or something? For that matter, how do you do the testing itself? Do you type "Most clowns drink blue fruit juice on Mars" in the search box and just verify that you get 184 hits, and say "right, it works", or do you have a more sophisticated method of testing it (for example do you run your test system against a captive internal dataset)?
Do you expect Google to be slashdotted? (Score:5, Interesting)
What would it take to Slashdot Google? What do you do to avoid this? Have you been Slashdotted before, either from Slashdot itself or from some other link?
Testing and Deployment (Score:5, Insightful)
Google API (Score:4, Interesting)
I have one question... (Score:3, Interesting)
that comes to mind when I think of a huge server farm like Google's: can you give a rough order of magnitude (# of zeros maybe) on what your electric bill is?
Thanks very much for Google. The more I use it the more I appreciate it.
Are you guys making enough money? (Score:4, Interesting)
I wish you'd give us some banner ads or something, I feel guilty. I don't want Google to go away.
Seriously, why don't you serve banner ads?
-Dan
Google Voice Search (Score:5, Interesting)
why the ancient design technique? (Score:3, Interesting)
For a site where speed and information delivery are of the utmost importance, and archaic table-based design seems rather strange. Is there any reason you have yet switched to a more forwards-compatible xhtml/css design? (Note that by "design" I mean more the html and css than the visual appearance of it)
For my own amusement, I've been looking at recoding the google design in CSS, and it's really not that hard.
Thanks!
Google toolbar for open source browsers? (Score:4, Insightful)
Why haven't you implemented yet the toolbar for open source browsers? Are there technical difficulties or rather lack of interest from Google?
Three letter agencies (Score:4, Interesting)
with the various intelligence agencies to provide
them information about seraches that are of interest to them?
Are you thinking of providing SSL access to your
web pages so that these agencies will have to work
with you instead just monitoring your network
traffic?
How do you benchmark your software? (Score:3, Interesting)
What's the back end? (Score:5, Interesting)
How to you build a cluster like a war machine, in other words?
Speech recognition (Score:5, Interesting)
Recognizing text in images and videos and indexing that would be a similar task. I know that Google Catalog Search [google.com] must be doing some OCR already, but I have no idea if this would take too many CPU cycles if applied to all images, or if there are other problems (the images themselves already get downloaded for the image search, so bandwidth should not be the problem).
Google Search Appliance - PageRank? (Score:3, Interesting)
The Google Search Appliance, however, is targeted at an office environment. Most of the documents (especially the non-html ones) in the typical office stand alone and do not have links to each other.
How has Google modified or complimented (if at all) the PageRank algorithm to make it more suited to an office environment?
I am currently pushing management at my site to purchase a Google Search Appliance, so I need an answer to this to help justify the change from our existing search application. i.e. without a good PageRank score, how does the Search Appliance order the result set in a useful way?
Regexp Support Someday? (Score:4, Interesting)
For example, let's say I'm looking for 80's brat pack member Anthony Michael Hall (not that I would do such a think), but I can't remember his middle name. Looking for "Anthony Hall" will do me little if any good, but looking for "Anthony \w+ Hall" could do the trick nicely.
Another example is that the user can provide their own limited fuzzy searching, by searching for optional prefixes and suffixes along with the root, instead of having to get the word or phrase exactly as it's indexed.
Thanks,
John
Next big thing? (Score:5, Interesting)
What is Google doing to keep itself on top? Do you think there is a lot of room for improvement? How do you think web searching can get better?
"The Slashdot Effect" (Score:4, Interesting)
The masses of Slashdotters have slashed and dotted many an unlucky website over the years...Pushing webservers to their limit and often breaking them outright...
With Google's Massive resources, Is there any noticeable difference when a
Use of Python (Score:5, Interesting)
As an avid fan of the Python language, I am interested in exactly how Google puts it to use. Can you clue us in?
P.S. - Keep up the good work!
downtime disaster stories (Score:4, Interesting)
We've all had servers crashing on us just before a deadline. We've all had to go to the office in the middle of the night to prevent a disaster. (we've all been hacked by a scipt-kid, once)
Do you have any stories of disasters or difficult moments in the datacenters that kept you all up for a few nights in a row, but went by unnoticed by the public?
My question... (Score:4, Funny)
:)
Kickstart
King of the search engines (Score:5, Interesting)
Does this chain of thought keep you up at night?
I like to watch... (Score:4, Interesting)
Has Google Replaced URLs? (Score:5, Interesting)
images.google.com and PNG images (Score:4, Interesting)
When will images.google.com include PNG images in its search base? Why were the image types limited to GIF and JPEG, when most browsers could also display PNG? Now, virtually all non-text browsers support Portable Network Graphics.
Questions done. I'll take this opportunity to thank Google for groups.google.com, the searchable usenet archive. In my opinion, 15% of the total value of the internet is contained therein. Excellent!
Bandwidth monitors. (Score:4, Interesting)
Can you guys put up bandwidth graphs for the public to see. Like mrtg graphs page showing daily google request traffic flow. so we can see what type of overall trends in searching happens during the day.
I would love to be able to see just how massive you traffic is and what it looks like.
and let us know what tools you use to monitor all your stuff.
Why Linux (over BSD, etc)? (Score:4, Interesting)
When you were selecting the OS to run Google, why did you choose Linux? I'm partial to FreeBSD but I'm pretty sure that you evaluated it and found something a) that you didn't like or b) something about Linux that you liked better. If so, what?
Second part of this question: Do you continue to evaluate alternative operating systems?
Chris
Re:First Question! (Score:4, Interesting)
What job opportunities are there at Google, and what opportunities in the industry as a whole?