The Google Search Server 178
An anonymous reader submitted a reasonably indepth review of
the Google search appliance. The guys from anandtech put it through it's paces, and included a variety of pictures and comments on one of those Google products most of us will probably never play with.
Neat insides (Score:5, Insightful)
Re:Google ate my server (Score:2, Insightful)
Sounds like it wasn't much of an admin tool if it required no authorization...any employee could have done what Google did, just not as quickly.
OS (Score:1, Insightful)
Re:Google ate my server (Score:3, Insightful)
Re:Review? & capacity (Score:3, Insightful)
Depending on how you have configured things it may also go ahead and read your banner ads and such as well. If you havent expliclty told your crawler to stay within someurl.com then it will go ahead and index the links that go to outside sites as well.
The solution that was presented in the article is a very common one when you want to simply index a subset of site content. Another common method for crawl systems that support scripting (like Plumtree's Ripfire or Verity) is to parse out the various urls you are looking for explicity as well as handle for things like pagination.
The former is perffered as it can easily be adapted to work with other search engines without re-writing custom scripts. I would not be surprised if anandtech now detects when GoogleBot is crawling thier site and presents GoogleBot as well as other search bots with the same page that thier applicance sees.
Re:It looks like the OS is WINDOWS (Score:2, Insightful)
Google on Time4ink.com (Score:2, Insightful)