Forgot your password?
typodupeerror
The Internet

What's It Like to be Google's Boss Techie? 671

Posted by Roblimo
from the how-many-miles-of-cable-do-you-think-they-buy-every-year? dept.
We'd like to welcome Google Director of Technology Craig Silverstein as our next Slashdot interview victim... err... guest. You think you run a big Linux server farm? Craig's is bigger. Think your Web site gets a lot of traffic and creates a lot of headaches? Just think what Craig must face! Post whatever you'd like to ask Craig below, one question per post. About 24 hours after this runs we'll email Craig 10 of the highest-moderated questions, and we'll post his answers shortly after he gets them back to us.
This discussion has been archived. No new comments can be posted.

What's It Like to be Google's Boss Techie?

Comments Filter:
  • I've wondered (Score:5, Interesting)

    by lblack (124294) on Thursday June 20, 2002 @12:02PM (#3736976)
    Google always seem to be early-to-market with some really highly developed software solutions, and also always seems to have the backbone to support them.

    I'm curious -- what drives the innovation? Is it the hardware team advancing architecture to permit the software team more room to play, or is it the software team saying, "Hey, look what we got!" and the hardware team dropping the iron to implement it?

    I understand there must be some level of synergy, but is it completely seamless or is one side of the equation effectively driving the other?

    Leem
    • by kaladorn (514293) on Thursday June 20, 2002 @12:49PM (#3737417) Homepage Journal
      Everyone will ask about bandwidth, incoming lines, etc. (All the network capacity and capability stuff). Here's something a little more off the beaten track:

      What technologies help to support the Google server farm? What kind of automated monitoring and trouble reporting tools are in use? Are they home brew, open-source, or COTS with some customization (scripts, etc)? And if you had to point to one area of network management and say "we could use some improvement or some better tools", what would that area be?

      BTW - Google Rocks! I never use anything else anymore!
    • by DRAGONWEEZEL (125809) on Thursday June 20, 2002 @02:04PM (#3738078) Homepage
      Is there anything on the internet that you personally couldn't find with google and if so what was it?

      p.s.
      Thanks for all your help with my school research
    • by jolshefsky (560014) on Thursday June 20, 2002 @02:08PM (#3738107) Homepage
      As the web develops, methods of matching a set of search keywords to a set of websites related to those keywords must change with it. I envision that the Google algorithms rank search hits by summing weighted factors such as overall site popularity, META tag keywords, META tag descriptions, TITLE tag contents, text contents, keywords containted in URLs, and so on.

      Can you talk a bit about how those weights have changed over time? Have there been any surprising shifts?

    • Corporate Culture (Score:4, Interesting)

      by zpengo (99887) on Thursday June 20, 2002 @02:31PM (#3738275) Homepage
      As an addendum to this, what is it about the corporate culture at Google that makes it work so well while other "hip" dot coms went down the toilet? What's the magic ingredient that made Google turn out differently?
  • Simple question (Score:4, Interesting)

    by lwdupont (153781) on Thursday June 20, 2002 @12:03PM (#3736986)
    What type of machines/setup does Google use?

    (I've heard thousands of PC's with everything in RAM, but I'd love to hear it from the horses mouth)
  • Statistics (Score:3, Interesting)

    by suwain_2 (260792) on Thursday June 20, 2002 @12:05PM (#3737000) Journal
    A relatively simple, non-intellectual question, but I've always wondered -- just how many hits/how much bandwidth do you consume, and how many servers do you have to handle the load.
  • Favoring Big Guys (Score:5, Interesting)

    by PenguinRadio (69089) on Thursday June 20, 2002 @12:06PM (#3737002) Homepage
    Does google's policy of "ranking" the sites that have hits favor the "big guys" over more specific smaller traffic websites? That is, would a story on a site like CNN get a higher ranking in google on a keyword "Gulf War" than say a site (gulfwarveterans.com) that deals 100% with the Gulf War? Do you think you are leading to the commercialization of the web (i.e. the big power players) over smaller sites?
    • Googlebombing (Score:3, Insightful)

      by HMV (44906)
      In fact, there is an opposite concern. Whether through a network of links or through coordinated googlebombing [googlebombing.com] [googlebombing.com], weblogs frequently show up near the top due to the nature of reciprocal linking between the blogs. Not saying that's good or bad (sometimes a sole voice is a better expert on a topic than CNN), but it is what it is. Ranking "links" seems valid enough, but then you ask if that includes machine-generated links by someone's aggregator and the issue becomes a little more cloudy.
    • by killmenow (184444) on Thursday June 20, 2002 @01:15PM (#3737677)
      Furthermore, estimates are that search engines miss a large portion of Internet content available. There must be literally millions of web pages that don't even show up in your cache because they are too small, or because nobody links to them. But there may be a site out there that has all the information you could ever want to know about some esoteric topic that only the person who created the site and the few friends that person may have...but since nobody else links to it, nobody else knows about it.

      So how do you find those treasure troves? And how do you decide which ones are treasure troves and which ones are the millions of "all about me" web pages? Or do you care?
  • by scubacuda (411898) <scubacuda@@@gmail...com> on Thursday June 20, 2002 @12:06PM (#3737005)
    What are YOU doing for Windependence Day [slashdot.org]?

  • by SquadBoy (167263) on Thursday June 20, 2002 @12:06PM (#3737006) Homepage Journal
    but I noticed a few months ago that Cisco now uses the Google engine to search the CCO. Congrats on that one. I've also noticed this new search box that Google is starting to produce. And it looks *very* cool. So my question is basically which is more important to your job the website or selling the service and the engine to people who need it?
  • by Black Aardvark House (541204) on Thursday June 20, 2002 @12:06PM (#3737008)
    Has there been any progress on the Pigeon Computing [google.com] initiative?
  • technetcast.ddj.com/ (Score:3, Informative)

    by rblackwe (240170) on Thursday June 20, 2002 @12:06PM (#3737010) Homepage
    A little old but interesting.

    The Technology Behind Google 2000-10-19 (1hr 13min) By Jim Reese, Chief Operations Engineer, Google. How to build an internet search engine that indexes 1-2 terabytes of data 200 million web pages- and serves it up at a rate of 1000 requests/second. (Hint: Start with a farm of 10,000+ Linux servers). The technology behind Google: company overview, search parameters and results, hardware and query load balancing, Linux cluster topology, scalability, fault tolerance, and more. [420]

    http://technetcast.ddj.com/tnc_search.html?key=g oo gle.
  • by RinkSpringer (518787) <rink@NOSPAm.rink.nu> on Thursday June 20, 2002 @12:07PM (#3737011) Homepage Journal
    I am wondering why they chose Linux. Specifically, I wonder how they made the choice between all major OS-es (Linux, *BSD, Solaris and possibly Windows), as well as the software they use to power the site.
    • by qurob (543434) on Thursday June 20, 2002 @01:02PM (#3737562) Homepage

      From one interview...

      Jason: What led to Google's decision to use Linux? When did that start?

      Sergey: Well, Larry Page and I were in the Stanford PhD program in Computer Science. And we developed Google there. The way the computer science program worked is there was a hodgepodge of computer equipment lying around, and we would grab whatever scraps we could. We had all kinds of computers: HPs, Suns, Alphas and Intel's running Linux. So, we gained a lot of experience with all of those platforms.

      When we started Google, we had to make the decision of what we wanted to use. Of course we chose Linux, because it is the most cost effective solution.

      PCs are not only much cheaper these days, but we can also get them very quickly, because they're such a commodity item. That's an incredible benefit. We just installed another 1,000 computers and we got that done in a few weeks. That's really hard to do with any other kind of workstation. I think that's an advantage that people don't entirely realize.

      Jason: Did you view it as being better, or was cost the main reason?

      Sergey: It was better in some ways. Certainly for our purposes, we felt the support was better. For example, the actual kernel authors will respond to problems pretty quickly. They are especially responsive to Google nowadays, since we're so widely used. We can have a 15 minute turnaround. You can't really beat that for support.

      That was an important factor, but frankly, the cost was a bigger issue. PCs are so cheap, which is very important. Sun's Solaris is probably more stable than Linux on PCs. It's hard to determine the blame, whether it's the hardware or the operating system. But, it's a minor difference.

      Jason: Then, does all of your support come from newsgroups or do you actually pay for it through Red Hat?

      Sergey: We have an operations team of about ten people, which helps a lot. And other than that we check newsgroups and e-mail the authors of the code. Usually, if it's a problem we can't figure out, we go straight to the authors.

      Jason: Is Linux used on desktops at Google?

      Sergey: It depends. Engineering mostly runs Linux. Business development/marketing runs Windows. Actually, I use Linux with VMWare running Windows. Some people have two computers, particularly some people in engineering who do UI development and need to test things out on Windows platforms. I find it better to just use VmWare and have one computer.

      Jason: In a technical sense, what does Linux lack? What does it not provide?

      Sergey: The 64-bit file system, which I know they are working on. It's slowly coming around. I think there are still occasionally some stability issues. I'm not saying Linux is unique in that respect, but you definitely want to have reliability. There are some issues dealing with higher memory systems. If you get to 2GB, and you try to push it past that, we encounter various problems. I know we've had some trouble with the network stack when we really push it hard. In terms of having lost most connections from lots of different machines.


      And from another...

      How is Linux used at the Google Projects? Why was Linux choose to improve Google search engine?

      Sergey Brin: Actually, we currently run over 6,000 RedHat servers.

      Linux is used everywhere...on the 6,000+ servers themselves, as well as desktop machines for all of our technical employees. We chose Linux because if offers us the price for performance ratio. It's so nice to be able to customize any part of the operating system that we like, at anytime. We have a large degree of in-house Linux expertise, too.

      Most of our administrative tools were developed in-house, as well.
  • Regression (Score:5, Interesting)

    by Have Blue (616) on Thursday June 20, 2002 @12:07PM (#3737014) Homepage
    The Internet is always described as a distributed system with no single point of failure. Google, however, has quickly become by far the most popular method of locating information. "Surfing" has been killed with modern search technology, it's so much easier to look through Google than the Web itself. If Google was down, I'm sure the Internet would be far less useful.

    Do you think Google has become an Internet point of failure? With the competition for larger and larger indexes, is the Internet becoming centralized? Do you think this is a bad thing?
  • by I Want GNU! (556631) on Thursday June 20, 2002 @12:07PM (#3737016) Homepage
    What are you doing to prevent the new generation of more sophisticated search engine spammers- spammers that use advanced software such as WebPosition Pro, spammers that feed fake pages to the Google crawler, spammers that make bogus link pages to their own sites? Doesn't this new level of sophistication on their part mean that in large part Google must emphasize human website reviewers, such as those provided by the Open Directory Project [dmoz.org], to a greater degree?
  • Stumped (Score:5, Interesting)

    by Bios_Hakr (68586) <xptical.gmail@com> on Thursday June 20, 2002 @12:07PM (#3737018) Homepage
    As a new network configuration guy, I am often stumped by a problem. I usually turn to google first, and my supervisor second. What has been the biggest problem that you have dealt with that will stand out in your mind years from now? As the "Head Techie", where did you turn, and what was the eventual resolution?
  • Scientology (Score:4, Interesting)

    by ender81b (520454) <billdNO@SPAMinebraska.com> on Thursday June 20, 2002 @12:08PM (#3737027) Homepage Journal
    Does google plan on releasing more products like the Google Search Appliance [google.com] in the near future - specifically those that are geared more towards the consumer level rather than business market? I would, personally, love to have some sort of google search engine on my machine to rummage through all the stuff I have. Does google plan on expanding into this market or will you remain focused on the web?

    I know, I know, Only one question but - it begs to be asked - how well is your technology going to be able to scale? Considering the near-expotential growth of the internet will PageRank be able to keep up?
  • Storage used (Score:5, Interesting)

    by Steffan (126616) on Thursday June 20, 2002 @12:08PM (#3737029)
    I understand that Google was using large numbers of IDE drives in lieu of more expensive but individually faster SCSI devices. What prompted the decision, and how have the concerns of reliability and performance been mitigated. What special technology, if any, was used to implement such a system
    • by johnjones (14274)
      everyone asking about hardware and to be honest its not what makes google good
      after all thewayback machine does kind of the same thing

      its software

      so this is my question

      what browser do you use ?

      regards

      john '1.1alpha' jones
  • I'm curious... (Score:5, Interesting)

    by rgoer (521471) on Thursday June 20, 2002 @12:08PM (#3737030)
    ...as to what exactly Google does with the concepts it receives through the various Google-tech contests held. Have these ideas been made good use of? Do we see any of this in the Google we use every day? What about the ones that didn't win, do we see any of them?
  • by FortKnox (169099) on Thursday June 20, 2002 @12:08PM (#3737031) Homepage Journal
    Whats the google language of choice for web page building. I'd assume speed is the most important, so what language makes google so fast?
  • Creative Ideas (Score:5, Interesting)

    by Domasi (318366) on Thursday June 20, 2002 @12:08PM (#3737035) Homepage
    Is there anything new that Google is working on that is not currently displayed in your labs [google.com] section? If so could you explain it to us?


  • peer pressure (Score:5, Interesting)

    by seanw (45548) on Thursday June 20, 2002 @12:08PM (#3737039)
    as Google got more popular and eventually reached the status it holds today, did you feel any pressure (either internally or from outside the organization) to switch from a Linux based cluster to a proprietary solution (Windows comes to mind, but there are others). Where you (or others at Google) affected by any of the FUD that is put out, and did it affect your perception of Linux as a viable solution?
  • by dimer0 (461593) on Thursday June 20, 2002 @12:09PM (#3737047)
    What are your biggest turn-ons?

    Turn-offs?

    The worst date you ever had?

  • by AntipodesTroll (552543) on Thursday June 20, 2002 @12:09PM (#3737049) Homepage
    I wonder if Taco is gonna chime in with the question:

    "So, interested in buying a nerdy weblog site, only slightly soiled?"
  • Dot com changes? (Score:5, Interesting)

    by Telastyn (206146) on Thursday June 20, 2002 @12:10PM (#3737052)
    Last I heard Google was still the stereotypical "startup" type company; promoting morale over bureaucracy as long as the work got done. Hockey, pool, the Greatful Dead's ex-chef (iirc?), and tons of other perks.

    Did google keep the atmosphere as you've grown? did they keep it while others tanked?
  • Specs (Score:3, Interesting)

    by DeadBugs (546475) on Thursday June 20, 2002 @12:10PM (#3737054) Homepage
    I would be curious to the general overall specs to the hardware and software running google. In particular standard cpu? Linux version\distributor? clustering? database? Total memory? Total storage? etc.

    Go on, make us jealous
  • so... (Score:3, Interesting)

    by RogueProtoKol (577894) on Thursday June 20, 2002 @12:10PM (#3737056) Homepage
    ... what linux distrubution does the world's largest server farm use?
  • Academic ties (Score:4, Interesting)

    by dallen (11400) on Thursday June 20, 2002 @12:10PM (#3737057) Homepage Journal
    It seems that Google's great successs is partly due to research coming out of the academic world. How many google employees have advanced degrees, and can they publish non-proprietry research after they join Google? How do you see the interplay between high-tech and Academia?
    • Re:Academic ties (Score:3, Interesting)

      by ender81b (520454)
      from g00gles site:
      Approximate number of employees: 400
      Ph.D.s on staff: 50+
      Languages spoken: 34
      Number of roller hockey players: 32
      Number of offices worldwide: 12
      Massage Therapists: 2
      Neurosurgeons: 1
  • by FortKnox (169099) on Thursday June 20, 2002 @12:11PM (#3737063) Homepage Journal
    Since sites like slashdot don't like to give out their statistics, I'd like to ask, what percent of users use what web browser? Also, what percent of users use what OS?
    • Josh, you can check the Zeitgeist to get the info on browser stats for a year span, same goes for OS-

      http://www.google.com/press/zeitgeist.html
    • They have a nice graph, but no scale. I suppose you could do some careful pixel analysis of the graph to generate percentages, but it's a shame they don't list them.

      Interestingly, I see "Other" has been steadily rising since it bottomed out in January, and has now surpassed Netscape 4. I would love to be able to click on that chart and see a detailed list of the percentages, and what "other" is composed of. Hopefully we'll see Mozilla get its own line on the graph soon.

      It would also be nice to see a breakdown on a per-OS basis. I wonder how many people are running Internet Explorer on Linux? (Seriously, that would indicate what portion of non-IE users hack the browser tag to make web sites happy.)
  • by mshomphe (106567) on Thursday June 20, 2002 @12:11PM (#3737064) Homepage Journal
    Does Google use any natural language processing (when dealing with web pages, queries, etc.)? Are you planning on doing more with NLP in the future?
  • CO$ and Deep Linking (Score:5, Interesting)

    by Xaoswolf (524554) <Xaoswolf@nOSPAM.gmail.com> on Thursday June 20, 2002 @12:11PM (#3737070) Homepage Journal
    There have recently been several cases where people have sued because of the act of deep linking, or in the case of the Church of Scientology and Xenu.net, linking to information on other people's pages that someone claims a copyright to.

    How have these affected you and your job, and what are you feelings on this subject?

  • Logo work? (Score:3, Interesting)

    by Xafloc (48004) on Thursday June 20, 2002 @12:12PM (#3737075) Homepage
    I have but one question... Who is the mastermind behind all the "special" logo changes that Google experiences throughout the year?

    My hats off to that team!

  • by Anonymous Coward on Thursday June 20, 2002 @12:13PM (#3737083)
    Why in this day and age does google continue to penalize sites that are virtual hosted? With ip addresses becoming harder to get/justify every day why does google discount the relevance of links that don't come from a unique ip address. Please don't just deny it, I think the Internet community deserves an explanation.
  • by PK_ERTW (538588) on Thursday June 20, 2002 @12:13PM (#3737089)
    Google recently ran it's "first annual programming contest," with a winner receiving $10,000. Many slashdotters suspect this was simply a way to recruit new talent. So, was finding new people one of the initial goals for this project, and have you hired any new programmers as a direct result of it? What were the other goals (PR, generation of new ideas, etc) where there?
  • by Marx_Mrvelous (532372) on Thursday June 20, 2002 @12:14PM (#3737093) Homepage
    It's well known that you use Linux in your mega clusters. I was wondering if you have ever been approached by Microsoft, Sun, or HP in an effort to switch to their proprietary OSes.

    I can't imagine that you haven't. It must have been a huge decision to invest in one technology, so are you satisfied with what you have?
  • by scubacuda (411898) <scubacuda@@@gmail...com> on Thursday June 20, 2002 @12:15PM (#3737102)
    How do you feel about Alltheweb.com having a bigger index [slashdot.org]?

  • by RembrandtX (240864) on Thursday June 20, 2002 @12:15PM (#3737105) Homepage Journal
    Recently, the english division of our company [black and decker] hired 'HyperMedia Trafficing' or some other similar named company to get them 'more exposure' in the search engines.

    [forget the ethical debate about that .. or why no one bothered to ask me what to do.]

    What I want to know, is - going fowards - as more and more of these companies start up, and discover more and more unscrupulious ways of 'loading' the search engines with bogus hits/visits/data/etc. .. How does Google plan to make sure they are :

    1) Not loosing ad $$ to these folks
    and
    2) prefenting every search from returning something like www.hotgrannysex.com or www.top50.com as the 1st (or first 15) results for a search on .. well .. pretty much anything.

  • by Talisman (39902) on Thursday June 20, 2002 @12:16PM (#3737110) Homepage
    No offense to Mr. Silverstein, but I'm much more interested in Cindy [google.com]! Beautiful, highly successful nerds are terribly rare!

    Just so I'm not off-topic:

    Mr. Silverstein, how does Cindy look in tight sweaters?

    Drool...

    Talisman
  • by SuperguyA1 (90398) on Thursday June 20, 2002 @12:16PM (#3737112) Homepage
    One of the most impressive things about Google to me is how easily you seem to have embraced an open model. I realize the outward view of a company can be quite different from the internal view. How easy is it actually to make decisions such as opening API's. If it's easy can you give some advice on how one might convince their boss.

    Thanks,
    -Dave
  • The future of Google (Score:5, Interesting)

    by glh (14273) on Thursday June 20, 2002 @12:16PM (#3737115) Homepage Journal
    Hi Craig!

    I think Google absolutely rocks. It has by far the most intelligent/helpful search engine results. Thanks for the great service.

    Now onto the questions- what is the Google vision / strategy for the future? Where can Google go? From a search engine perspective, what are some of the challenges that you have and improvements that can be made (perhaps speeding up crawling to make the latest content available, for example)? How are you going about solving these challenges, and when can we expect them to be implemented?

    On a similar note, I've noticed that recently Google announced a "google box" that allows for corporate to take advantage of the google search algorithms and indexing. Any more products like this being planned?
  • Attacks? (Score:4, Interesting)

    by Fnagaton (580019) on Thursday June 20, 2002 @12:18PM (#3737132) Homepage Journal
    I have a number of web servers, some Unix some Windows, and the number of attempted attacks each day from different IPs must run in to about one hundred. It is mostly people trying to execute commands or using malformed URLs trying to exploit some known past security hole. My question is, how many attempted attacks each day do the Google servers get?
  • Can Google last? (Score:5, Interesting)

    Google is a great free public resource. My concern is that it has to be expensive running a resource like that. I know Google's strategy is somewhat to use the free resource as a loss leader to promote your search technology, but the key word in "loss leader" is "loss". It's a great theory as long as you are able find people who want and need your search technology.

    So my bottom line question is this: Does the web site pay for itself via the advertising? Is there a possibility that someday Google may decide the web site costs too much money to run if you get to a point where your reputation no longer needs the loss leader?

  • by paradesign (561561) on Thursday June 20, 2002 @12:18PM (#3737134) Homepage
    I know the programming contest winner gets a tour of your facility, but I think I speak for all of us when I say, I wanna see it too!

    It would be great if you did a documentary feature with TechTv or someone, because its one thing to read about your facility, but it would be another to see it.

    Thanks for all of the help I've gotten from Google.com, I don't think I'd still be in schol without it.

    Paradesign

    PS, even just a photo feature on the site would be nice.

  • Google cache (Score:5, Interesting)

    by Greenrider (451799) on Thursday June 20, 2002 @12:18PM (#3737136)
    Anyone who has ever needed a piece of information that was on a broken page will agree that the Google page cache is perhaps one of the most underrated and useful parts of your search engine.

    There's one problem that everyone has with the cache, however - you don't deep-nest the caching, so that following any links on a cached page will lead to the original (probably broken) site, instead of to another cached page. Is there a technical or legal reason for why it works this way? Any chance we'll see deep caching at some point?
    • Re:Google cache (Score:3, Informative)

      ...you don't deep-nest the caching, so that following any links on a cached page will lead to the original (probably broken) site, instead of to another cached page.
      Check out the Google Toolbar [google.com] (for IE only, alas)-, which adds a "Cached Snapshot of Page" item to the right click menu. Very, very cool.
  • by rob_from_ca (118788) on Thursday June 20, 2002 @12:21PM (#3737169) Homepage
    How do you avoid business pressures to make short-sighted solutions, and consistently make good, common sense ideas work instead of adopting ones from marketing sources? Not only does Google have the best search engine technology, but you consistently do the "right" thing. Clean, quick homepage, text only well-identified ads, interesting research projects, etc...This is the way many search engines start, but they all went the way of the "dark" side instead of adopting the "right" solution. In my jobs, it's been very difficult to execute and justify good engineering (or just common sense) under pressure from the people who control the money. Any advice for driving through well-thought-out decisions instead of adopting the "management fad of the month"?
  • by sphealey (2855) on Thursday June 20, 2002 @12:22PM (#3737171)
    In one of Robert Heinlein's novels (don't have the reference at hand), the main character is told to sit down in front of what we would think of today as (WWW + Google) and "learn whatever she can about everything". After a few weeks of coming up with some useful stuff, she finally asks the system: 'who controls this database?', and it replies 'not programmed with that information'. The next morning an assasination team tries to kill her.

    Not to be too "X-File'ish", but does there come a point where too much knowledge is captured in Google? A point where anything that doesn't exist in Google doesn't exist, period? Wouldn't that represent a very tempting target for a bin Laden or a John Ashcroft, to try to control how the modern world thinks?

    Kind of far out there, I know, but do you guys worry about this kind of thing?

    sPh

  • Slashdot effect? (Score:5, Interesting)

    by Lumpish Scholar (17107) on Thursday June 20, 2002 @12:22PM (#3737174) Homepage Journal
    Many sites, when referenced by Slashdot, crumble under the load. Can you folks see any difference, either to your "main" servers (www.google.com) or your cache servers?
  • mod_google (Score:5, Interesting)

    by TwP (149780) on Thursday June 20, 2002 @12:26PM (#3737213) Homepage
    Just curious when mod_google is going to be released for the apache webserver. It would be nice to have the power of Google indexing available to those of us without significant IT budgets (i.e. wife won't let me "buy another #$*@! computer").
  • When things get ugly (Score:5, Interesting)

    by timdorr (213400) on Thursday June 20, 2002 @12:28PM (#3737227) Homepage
    What's the worst thing ever to happen to the google server farm? (Besides the pidgeons knawing on cables)
  • Dealing with DoS (Score:5, Interesting)

    by Wanker (17907) on Thursday June 20, 2002 @12:29PM (#3737237)
    How does google deal with denial of service attacks, particularly distributed ones?

    The rest of us just suck it up with fat network pipes, but a high-profile target like google would be the holy grail of Internet vandals.

    Has anyone ever poisoned your DNSes, effectively taking Google down even though the server are up? Successfully inserted bogus WAN routing info into the Internet, again effectively bringing down Google even though the servers are fine?

    What's your worst cracker/net vandal story?
  • by timeOday (582209) on Thursday June 20, 2002 @12:30PM (#3737251)
    Conventional wisdom holds that marketing and strategy are the keys to success in business, and that technical excellence is a relatively minor factor. Yet google seems to have come out of nowhere to dominate an already crowded market for search engines - without Superbowl ads, a mascot, or (unless I'm mistaken) an IPO.

    To what do you credit the popularity of google? Do you consider google a "success," or are you holding out for thousands of employees and billions in cash flow?

  • by ManxStef (469602) on Thursday June 20, 2002 @12:32PM (#3737260) Homepage

    Personally I'm usually pretty drained after a fun day staring at the screen and typing like a monkey, and sometimes completely avoid the PC when I get home, prefering to chill with a decent book (currently Cradle to Cradle [slashdot.org]), zone-out in front of the TV, or go cycling in the beautiful Isle of Man [isleofman.com] (watch "Waking Ned Devine" for an idea of the scenery - jealous?<grin/>).

    So I guess my completely-non-tech question is:

    What do you do in "loafing" time (ie. loaf - To pass time at leisure; idle.), when you've left the office, "lost" the pager/Blackberry/PDA/mobile etc., and got away from it all?

    Cheers,
  • by __past__ (542467) on Thursday June 20, 2002 @12:33PM (#3737276)
    What do you think about the Semantic Web [w3.org] initiative driven by the W3C and others?

    Do you expect widespread usage of RDF [w3.org]/DAML [daml.org]/OWL [w3.org]/TopicMaps [topicmaps.org] for explicit meta-data annotation of web resources, or will it be used only in small circles of specialized content providers like academia, or maybe not at all?

    How will Google react? Do you plan to use meta-data provided by web resources if found, and how will you decide if it isn't just made up to get people on some bogus pr0n site (like with those <meta>-Tags today)? Will it someday render the brute-force approach of full-text-indexing obsolete?

  • by mr_don't (311416) on Thursday June 20, 2002 @12:34PM (#3737282)
    ...or, what if google gets hit by a bus?

    Google has become such an important part of the Internet for millions of average users. With this in mind, my friends and I often joke about what would happen if (knock on wood) Google were to go out of business. I suggest that ICANN should do something useful for a change, and fund Google as an official, non-profit project for searching the net.

    Although I have heard that Google turns a good profit, what exactly is preventing Google from becoming a not-for-profit organization? Couldn't Google take the extra income from licensing its search to create better search technologies and pay the employees, rather than make some shareholders rich? Wouldn't this perhaps make Google a more sustainable organization?


  • by dargaud (518470) <slashdot2NO@SPAMgdargaud.net> on Thursday June 20, 2002 @12:37PM (#3737319) Homepage
    The google cache threads on the muddy water of copyright. Do you feel like you are going to run into trouble because of it ? Some conflicted opinions about it are:
    • It serves copyrighted pages without the author's consent
    • It serves pages without the original site's knowledge
    • It's very useful
    • If a page is on the web, it can be archived/cached...
  • Staying on Top... (Score:5, Interesting)

    by Dr. Molf (586917) on Thursday June 20, 2002 @12:38PM (#3737321) Homepage
    Google is an incredibly popular and effective website. I'm curious about the amount of pressure you have to expand in order to "stay competitive" or "aptly serve consumer's needs". Is there any kind of a push to go the way of yahoo or amazon and try and include EVERYTHING on that simple page? As things evolve, do you really see Google staying the top engine in 3 to 5 years?
  • Newsgroups (Score:5, Funny)

    by scott1853 (194884) on Thursday June 20, 2002 @12:38PM (#3737330)
    I've made some really stupid posts to the newgroups in the past and I used my real name. Can you delete them for me?
  • by duffbeer703 (177751) on Thursday June 20, 2002 @12:41PM (#3737350)
    How do you guys manage thousands of servers spread throughout multiple datacenters?

    How do you handle user accounts? Event notification?

    Do you guys use "enterprise" software like Tivoli or Openview, or did you roll your own solution?
    • by James Youngman (3732) <jay&gnu,org> on Thursday June 20, 2002 @01:39PM (#3737875) Homepage
      A previous poster (duffbeer703) asked
      How do you guys manage thousands of servers spread throughout multiple datacenters?

      How do you handle user accounts? Event notification?

      Do you guys use "enterprise" software like Tivoli or Openview, or did you roll your own solution?

      ... to which I would add,

      How do you balance the need to keep systems up-to-date against your (doubtless demanding) availability requirements? Is there enough redundancy in there that you just flip a machine out while it is updated? Presumably, however the machine is upgraded, this is automatic - you must have too many machines to do it any other way!

      How to you test these updates (security patches, distribution updates, regular changes to your own software, configuration tweaks)? Do you have some kind of enormous test environment containing a copy of 50% of the main Google cache or something? For that matter, how do you do the testing itself? Do you type "Most clowns drink blue fruit juice on Mars" in the search box and just verify that you get 184 hits, and say "right, it works", or do you have a more sophisticated method of testing it (for example do you run your test system against a captive internal dataset)?

  • by Saint Aardvark (159009) on Thursday June 20, 2002 @12:43PM (#3737367) Homepage Journal
    Dang, torn between modding up questions and submitting one of my own...

    What would it take to Slashdot Google? What do you do to avoid this? Have you been Slashdotted before, either from Slashdot itself or from some other link?

  • by mo (2873) on Thursday June 20, 2002 @12:46PM (#3737387)
    How can you possibly test bugfixes/changes that need to get deployed to thousands of machines? Furthermore, how in the heck do you deploy the changes once they're tested. I understand you probably can't describe the exact process, but perhaps you can enlighten us on some principals learned on the subject of CM on such a massive scale.
  • Google API (Score:4, Interesting)

    by __past__ (542467) on Thursday June 20, 2002 @12:46PM (#3737390)
    After the introduction of the Google API [google.com], some people, especially from the REST [ebuilt.com] camp, criticized [xml.com] the the use of SOAP [w3.org], claiming it just adds superflous bloat and is generally "unwebby". What do you think about this?
  • by Helmholtz Coil (581131) on Thursday June 20, 2002 @12:48PM (#3737413) Journal

    that comes to mind when I think of a huge server farm like Google's: can you give a rough order of magnitude (# of zeros maybe) on what your electric bill is?

    Thanks very much for Google. The more I use it the more I appreciate it.

  • by Control-Z (321144) on Thursday June 20, 2002 @12:50PM (#3737431)
    Are you guys making enough money?

    I wish you'd give us some banner ads or something, I feel guilty. I don't want Google to go away. :)

    Seriously, why don't you serve banner ads?

    -Dan
  • Google Voice Search (Score:5, Interesting)

    by NeoYoda (584952) on Thursday June 20, 2002 @12:50PM (#3737434)
    There has been much debate about what the practical purpose for Google Voice search [google.com] might be, could you fill us in? Is it really for use in cars?
  • by Ravagin (100668) on Thursday June 20, 2002 @12:52PM (#3737449)
    Ahoy love the google, it's the only engine I trust these days. Nevertheless....

    For a site where speed and information delivery are of the utmost importance, and archaic table-based design seems rather strange. Is there any reason you have yet switched to a more forwards-compatible xhtml/css design? (Note that by "design" I mean more the html and css than the visual appearance of it)

    For my own amusement, I've been looking at recoding the google design in CSS, and it's really not that hard.

    Thanks!
  • by apol (94049) on Thursday June 20, 2002 @12:55PM (#3737491)
    I've become addicted to the Google toolbar [google.com]. It only works with IE which I use at work since I am forced to use windows there. Now with Mozilla 1.0 and my constant wish to minimise the usage of Microsoft products, I am faced with the dilemma of keeping IE or loosing the Google toolbar.

    Why haven't you implemented yet the toolbar for open source browsers? Are there technical difficulties or rather lack of interest from Google?

  • by Anonymous Coward on Thursday June 20, 2002 @12:58PM (#3737521)
    Can you tell us anything about how you are working
    with the various intelligence agencies to provide
    them information about seraches that are of interest to them?
    Are you thinking of providing SSL access to your
    web pages so that these agencies will have to work
    with you instead just monitoring your network
    traffic?
  • by Nijika (525558) on Thursday June 20, 2002 @01:00PM (#3737537) Homepage Journal
    How does Google benchmark software? Eg how do you benchmark Apache, SQL, your CGIs etc...
  • What's the back end? (Score:5, Interesting)

    by Second_Derivative (257815) on Thursday June 20, 2002 @01:03PM (#3737563)
    I don't remember what HTTPd they're running but it sure as hell isn't apache. Someone said that they get 1k hits per SECOND; what do you use to shape that insane amount of traffic? What is the '/search' page coded in? What databases are used to index a terabyte of data? How do those 10,000 nodes find the data they need to quickly? what sort of interlinks are used?

    How to you build a cluster like a war machine, in other words? ;)
  • Speech recognition (Score:5, Interesting)

    by harmonica (29841) on Thursday June 20, 2002 @01:09PM (#3737626)
    Are there plans to index audio files (and the audio tracks of video files) so that these could be searched as well? I would guess that existing speech recognition packages could be reused for this purpose so that development would not be too complicated.

    Recognizing text in images and videos and indexing that would be a similar task. I know that Google Catalog Search [google.com] must be doing some OCR already, but I have no idea if this would take too many CPU cycles if applied to all images, or if there are other problems (the images themselves already get downloaded for the image search, so bandwidth should not be the problem).
  • by nanobug (446693) on Thursday June 20, 2002 @01:13PM (#3737666)
    Google's PageRank technology works very well on the web with lots of pages pointing to lots of other pages.

    The Google Search Appliance, however, is targeted at an office environment. Most of the documents (especially the non-html ones) in the typical office stand alone and do not have links to each other.

    How has Google modified or complimented (if at all) the PageRank algorithm to make it more suited to an office environment?

    I am currently pushing management at my site to purchase a Google Search Appliance, so I need an answer to this to help justify the change from our existing search application. i.e. without a good PageRank score, how does the Search Appliance order the result set in a useful way?
  • by ReadParse (38517) <johnNO@SPAMfunnycow.com> on Thursday June 20, 2002 @01:17PM (#3737699) Homepage
    A big part of Google's strength is in the supported search syntax, most notably that you can search for phrases instead of just keywords, that you can filter OUT certain phrases or keywords, and that you can search for content on specific sites, or NOT on specific sites. The next step for me and probably a lot of other Unix/Perl types is regular expression support.

    For example, let's say I'm looking for 80's brat pack member Anthony Michael Hall (not that I would do such a think), but I can't remember his middle name. Looking for "Anthony Hall" will do me little if any good, but looking for "Anthony \w+ Hall" could do the trick nicely.

    Another example is that the user can provide their own limited fuzzy searching, by searching for optional prefixes and suffixes along with the root, instead of having to get the word or phrase exactly as it's indexed.

    Thanks,
    John
  • Next big thing? (Score:5, Interesting)

    by byee (221083) on Thursday June 20, 2002 @01:20PM (#3737719)
    A few years ago Google came along with their new ranking algorithim and blew away all other competition. Now it's the only search engine I, and the vast majority of the people I know, use.


    What is Google doing to keep itself on top? Do you think there is a lot of room for improvement? How do you think web searching can get better?

  • by Zzootnik (179922) on Thursday June 20, 2002 @01:22PM (#3737736)
    So I'm truly surprised no one has asked this one yet, as it's the first thing that popped into my head...

    The masses of Slashdotters have slashed and dotted many an unlucky website over the years...Pushing webservers to their limit and often breaking them outright...

    With Google's Massive resources, Is there any noticeable difference when a /. story gets posted and people go stampeding to google to find out more? Or is that happening right now? (I'd hate to think of myself as part of a huge herd of individually acting DDOS'ers, but unfortunately, that's about what it ends up being...)

  • Use of Python (Score:5, Interesting)

    by BobRoss (63028) on Thursday June 20, 2002 @01:49PM (#3737955)
    I have heard that Google uses Python extensively to manage its data, grab new data, etc.

    As an avid fan of the Python language, I am interested in exactly how Google puts it to use. Can you clue us in?

    P.S. - Keep up the good work!
  • by fons (190526) on Thursday June 20, 2002 @01:54PM (#3737981) Homepage

    We've all had servers crashing on us just before a deadline. We've all had to go to the office in the middle of the night to prevent a disaster. (we've all been hacked by a scipt-kid, once)

    Do you have any stories of disasters or difficult moments in the datacenters that kept you all up for a few nights in a row, but went by unnoticed by the public?
  • by Kickstart70 (531316) on Thursday June 20, 2002 @01:55PM (#3737992) Homepage
    What's the root password?

    :)

    Kickstart
  • by mikosullivan (320993) <miko&idocs,com> on Thursday June 20, 2002 @02:25PM (#3738242)
    • CUI was king of the search engines. WebCrawler took them down.
    • WebCrawler was king of the search engines. AltaVista took them down.
    • AltaVista was king of the search engines. Google took them down.
    • Google is king of the search engines.

    Does this chain of thought keep you up at night?

  • I like to watch... (Score:4, Interesting)

    by odbodbo (585329) on Thursday June 20, 2002 @02:55PM (#3738474)
    I used to kill hours watching the search requests scroll by on Metacrawler's Metaspy [metaspy.com] page back when people still used Metacrawler. Any chance we could have something like that on Google? I would *almost* even pay to subscribe to a site where I could watch uncensored Google search requests go by.
  • by billnapier (33763) <napier@poboxBOYSEN.com minus berry> on Thursday June 20, 2002 @03:05PM (#3738523) Homepage
    With the success and popularity of Google, I find myself using URL's for places less and less and just entering names into Google to find places (they are almost always on the first page...) Do you think that you have almost replaced the URL?
  • by Rock (16836) on Thursday June 20, 2002 @03:33PM (#3738713) Homepage Journal
    Craig,

    When will images.google.com include PNG images in its search base? Why were the image types limited to GIF and JPEG, when most browsers could also display PNG? Now, virtually all non-text browsers support Portable Network Graphics.

    Questions done. I'll take this opportunity to thank Google for groups.google.com, the searchable usenet archive. In my opinion, 15% of the total value of the internet is contained therein. Excellent!
  • Bandwidth monitors. (Score:4, Interesting)

    by _ph1ux_ (216706) on Thursday June 20, 2002 @03:53PM (#3738898)
    I was gunna mod... but I ahve a re-quest-ion.

    Can you guys put up bandwidth graphs for the public to see. Like mrtg graphs page showing daily google request traffic flow. so we can see what type of overall trends in searching happens during the day.

    I would love to be able to see just how massive you traffic is and what it looks like.

    and let us know what tools you use to monitor all your stuff.
  • by cjsnell (5825) on Thursday June 20, 2002 @04:16PM (#3739051) Journal

    When you were selecting the OS to run Google, why did you choose Linux? I'm partial to FreeBSD but I'm pretty sure that you evaluated it and found something a) that you didn't like or b) something about Linux that you liked better. If so, what?

    Second part of this question: Do you continue to evaluate alternative operating systems?

    Chris

Passwords are implemented as a result of insecurity.

Working...