The Google Search Server 178
An anonymous reader submitted a reasonably indepth review of
the Google search appliance. The guys from anandtech put it through it's paces, and included a variety of pictures and comments on one of those Google products most of us will probably never play with.
Neat insides (Score:5, Insightful)
Re:Neat insides (Score:5, Informative)
The tools google provides (very easy binary updates, strong web control panel, for example) turn the relatively common task into a dead-simple, point-and-click configuration.
They even provide a decent interface for skinning the search pages, and while it's not perfect, it's certainly adequate for even the best looking sites on the internet.
Re:Neat insides (Score:3, Interesting)
When looking at the google appliances, I thought it was really cool how it learns your specific terms and acronyms and it will do the "Did you mean correctspellingword?" like google does.
Pretty slick from what I gather. I have no direct experience except for google proper.
Re:Neat insides (Score:3, Funny)
Quit complaining, it's not like this was being called an indepth review.....oh, wait.
Re:Neat insides (Score:1, Funny)
You can do this yourself; try searching the Anandtech site. It's quick, and the results look like Google results.
Re:Neat insides (Score:3, Funny)
I must say I'm disappointed that this [anandtech.com] is what Google passes off as a flux capacitor.
Google is Dead anyway (Score:3, Funny)
Re:Google is Dead anyway (Score:5, Funny)
- Steve Ballmer
"Whether you like it or not, history is on our side. We will bury you."
- Nikita Khrushchev
Did Ballmer take off his shoe and start banging on the podium while he talked?
Re:Google is Dead anyway (Score:5, Funny)
This should clearly tell you that Google is already undead, and keep rising again. He has already killed them before. Don't worry!
Re:Google is Dead anyway (Score:2, Funny)
AnandTech not very search optimization saavy (Score:5, Informative)
Their solution was to create a list of urls for the appliance to crawl. If they had to do that for the search appliance, there is no way that googlebot, msnbot, or yahoo slurp is going to be able to properly index their site.
Your public accessable urls need to managed and canonicalized through judicious use of robots.txt, 302 redirects, site wide linking, and just plain thinking out the layout of your site.
Re:AnandTech not very search optimization saavy (Score:3, Funny)
Re:AnandTech not very search optimization saavy (Score:3, Informative)
A word to the wise: don't let the Mini crawl your entire site without keeping a close eye on it.
The same could be said of any search engine, or any automated process for that matter. We use ht://Dig and the issues are the same, except ht://Dig can be run locally on the server, saving bandwidth (and speeding up the indexing process) by indexing locally and re-writing urls for static files, through apache for dynamic, it's free, and you aren't limited to 100000 docum
Re:AnandTech not very search optimization saavy (Score:2)
Unfortunately feature holes like this are why the thing hasn't taken off. If
Re:AnandTech not very search optimization saavy (Score:3, Interesting)
Was this a review? (Score:2, Informative)
I gotta say, I was looking for benchmarks, usability scores, maybe some test scenarios. Even better, compare this to other products available out there.
It looked promising at the start, but when you get to the last page it leaves you wondering if they forgot the hyperlinks for the rest of the article!!
Re:Was this a review? (Score:2, Funny)
This revolutionary interface will fire off your search responses as accurately as a plastic chair bouncing around the room.
Re:Was this a review? (Score:2)
subcontractors (Score:1, Funny)
I was disappointed to see GigaByte didn't use MegaByte to make some subcomponent.
Re:subcontractors (Score:3, Funny)
Maybe he was too busy trying to take over Mainframe?
Re:subcontractors (Score:2)
Wow! That's a pretty obscure Reboot reference. I had totally forgotten about that show . . .
Oh come on (Score:4, Funny)
Second, it was a Google Mini.
Third, they didn't "put it through its paces" at all.
Lousy article, misleading
Good, but... (Score:5, Interesting)
While this is an interesting article, it really isn't much of a review of the Google Mini. All they do is take it apart, take pictures, and tell you that they set it up after a little bit of trouble. There is nothing about how well it actually works. No benchmarks. No comparisons. They just say that it worked well and leave it at that. Anandtech has had more indepth reviews of mice before.
It is more information that I have seen anywhere else though.
Re:Good, but... (Score:3, Interesting)
The terms & conditions probably forbid reverse engineering and/or disassembly of the appliance.
It would have been veeerrry easy to rip out the HDD and mount it on a Linux box to check out its internals....
They must have thought of that. As they've already ruined the warranty (by opening the box), it was probably the EULA or something like that that made them stop short of reviewing contents of the hard disks.
Free Google T-Shirt (Score:5, Funny)
It's "its"! (Score:5, Informative)
It's really easy: It's "his", hers", and "its". Even a flower [angryflower.com] knows!
--cycling through grammar Nazi mode. Please wait.
Re:It's "its"! (Score:3, Funny)
Make that "It's "his", her", and "its".
*sigh*
--completed grammar Nazi mode. Resuming normal operation.
Re:It's "its"! (Score:2)
Should've used single quotes there in the first place, and confused everyone cos on computers they're drawn the same as apostrophes
J.
Re:It's "its"! (Score:4, Informative)
Well, that is what someone told me anyway. English is not my primary language, if the above is not correct then please don't shoot me.
Re:It's "its"! (Score:2)
Re:It's "its"! (Score:4, Informative)
and use its' when it's possesive
john's coming to get johns' hat
Don't listen to this guy. He has lied to you twice. 1) Its' is never valid. 2) The example with John is just so wrong it hurts. "John is coming to get John's hat." You use 's for possessive; s' is for possessive plural, like this: "Slashdotters tend to live in their parents' basement."
Re:It's "its"! (Score:2)
Unless there's just the one big basement?
Re:It's "its"! (Score:2)
In your example, you want to emphasise that it is, as opposed to isn't.
Also, "Indeeed, it is" isn't a proper sentence; there's no verb.
How about "It's true that you can replace 'it is' with 'it's'"?
Re:It's "its"! (Score:2)
Oh, wait, I followed the wrong rule in my "proof."
At least the whole "I before E" thing has a little rhyme that usually works.
where's the raid? (Score:5, Interesting)
Re:where's the raid? (Score:5, Funny)
Here. [gmail.com]
Re:where's the raid? (Score:5, Informative)
What you're really buying here is closed-source software, wrapped in the hardware that turns it into an "appliance". Assume $2,000 of that $3,000 pays for the software.
By specifying the hardware in this way, and by keeping the BIOS and root passwords to themselves, Google greatly simplify their support role.
This is common practice: an IBM HMC (Hardware Management Console) is a 1U PC with a custom Linux distribution and the management software preinstalled. You don't get the root password; you just use the software as delivered.
Re:where's the raid? (Score:2)
Now that's just plain silly. A basic x86 1U server [pogolinux.com] runs around $1100 with two hard drives configured in software RAID1, which works wonderfully other than not allowing hotswap and preventing boot if the first drive is the one that fails. For another $150 or so you can add a hardware RAID card to fix both of those things and get slightly better performance.
There is absolutely NO excuse for not running a raid on any modern server. Drives are the most
Re:where's the raid? (Score:2)
And what if a ram chip goes faulty?
What if a capacitor on the motherboard starts leaking?
Just get two of the damn things, place them in seperate data centers, and round robin them if search is a critical feature.
Re:where's the raid? (Score:2)
Re:where's the raid? (Score:2)
I think the point is that Google doesn't want you replacing parts yourself. If you can deal with sending the device back to Google for servicing, then you can deal with reindexing.
Re:where's the raid? (Score:2)
Chris
Re:where's the raid? (Score:2)
Try searching the site for "google mini" (Score:5, Funny)
Review? & capacity (Score:2)
If, by "resonably indepth review", you mean lots of pretty pictures and a narrative about opening the box and the case, then sure.
Rather than calling this a review, perhaps it could be re-titled "One man's demonstration of the Google search appliance."
That said, I'm a little concerned about how many URLs it can handle... 100,000? According to TFA, 40,000 documents overloaded this thing.
The article did not address how th
Re:Review? & capacity (Score:2)
My reading of TFA was that the Mini was encumbered with an arbitrary limit of 40,000 documents.
That is, if you want to index >40,000, Google wants more money from you. It's purely to do with software licensing.
Re:Review? & capacity (Score:2)
But if each article is 3 pages long on average, that's 120,000 documents/url's right there.
Re:Review? & capacity (Score:2)
The appliance can index 100,000 at the lowest licencing level. Even if you only have 40,000 documents, you need to keep an eye on the crawler, and make some changes if it starts counting pages twice (printable/alternate versions, or multiple pages of single documents perhaps).
RTFA more closely (Score:2)
Re:RTFC more closely (Score:2)
I guess TFA being from the you-know-for-the-kids-dept explains it pretty well.
Re:Review? & capacity (Score:3, Interesting)
All this info can also be gotten from http://www.google.com/enterprise/ [google.com], which is exactly 1 (one) click away from Google's index page.
Re:Review? & capacity (Score:3, Insightful)
Depending on how you have configured things it may also go ahead and read your banner ads and such as well. If you havent expliclty told your crawler to stay within someurl.com then it will go ahead and index the links that go to outside sites as well.
The solution that was presented in the
Google ate my server (Score:5, Interesting)
We had a couple times when the appliance locked up and had to be rebooted. That was probably the most distressing as it had to be on 24x7 to support our organization and I wasn't looking forward to the help desk calls.
More amusing, though, was the way it crawled content. Google works like any other crawler - it goes around and clicks hyperlinks. Unfortunately it's not too bright, not paying attention to the text of the hyperlink, like if it said "delete" or something like that.
Unfortunately I had a poorly secured application that Google was able to sneak into via another link I wasn't aware of. It held the custom links for each of our departments to display a personalized set of links on the home page. Unfortunately it went through the admin tool and clicked every delete link it could find. I was paged the next morning and was fairly unhappy. My fault, though.
The irony is that the budget money evaporated and we aren't getting it after all.
Re:Google ate my server (Score:2, Insightful)
Sounds like it wasn't much of an admin tool if it required no authorization...any employee could have done what Google did, just not as quickly.
Re:Google ate my server (Score:3, Insightful)
Re:Google ate my server (Score:2)
(I'm not the AC who first posted)
Re:Google ate my server (Score:2)
Don't use GET to modify application state! (Score:5, Informative)
Universal Resource Identifiers -- Axioms of Web Architecture : Identity, State and GET [w3.org]
In HTTP, GET must not have side effects.
In HTTP, anything which does not have side-effects should use GET
If somebody visited your site with a pre-fetching tool like the google web accelerator, you will also find the "delete" button being checked automatically like this. Change those deletes to use POST instead.
Re:Google ate my server (Score:2)
Thank god for backups..
Re:Google ate my server (Score:2)
Robots.txt has the protective power of a big red Don't Push button on a public street. Heck, I keep an eye on anyone that comes to my datacenter, in case their eyes start to fixate on the EPO button...
Re:Google ate my server (Score:2)
The HTTP spec says that a GET should not perform anything, i.e. not change data. This is why "delete" hyperlinks should at least have an "are you sure" page with a posting form before actually deleting anything. Just a hint for your next project!
interesting review. (Score:2)
I was certainly looking forward to some overclocking and linux installing. I mean, I'm sure they voided whatever agreement they had with google just by opening the case up, so why no go all out and give us the review we really want to read.
I didn't even realize the review was over until I realized there was no "next" button on that last page.
Sweet (Score:2)
Save $3000 with site:anandtech.com (Score:2, Informative)
Re:Save $3000 with site:anandtech.com (Score:2)
Re:Save $3000 with site:anandtech.com (Score:2)
The ads on the Google page are put there by Google, not Anandtech. I assume Anand will have search-targeted ads on their own results page soon enough.
From TFA (Score:5, Funny)
Right.. Only unthreaded screws can be opened by a regular screwdriver.
Re:From TFA (Score:2)
Where are the pigeons? (Score:2, Funny)
Re:Where are the pigeons? (Score:2, Funny)
Re:Where are the pigeons? (Score:2, Informative)
According to Google, they do use pigeons [google.com].
For those who're interested... (Score:5, Informative)
If you want the specs:
Dual Xeon 2.6GHz
12GB RAM
4 250GB HD's in RAID(something) with a hot-swap spare.
Never tried taking off the cover though, since we want to keep the warranty.
All of the money you pay is a license for the software on the box, the system itself is effectively free, so once the 2 year warranty expires, you've effectively got a nice powerful linux box for free. You can keep running the software, but without any support.
As for performance, this thing works great, we have about 250,000 pages that it can index, both public and private (and it can do searches cleverly checknig username/pasword to see if you should have access to certain results), and we've had nothing but positive responses from our users. The results come up quickly, they're the results people want, and the results that management think should be at the top, are at the top.
Re:For those who're interested... (Score:2)
Re:For those who're interested... (Score:5, Informative)
After BIOS and before web-interface? (Score:2, Interesting)
Re:After BIOS and before web-interface? (Score:4, Interesting)
It does end up at a login prompt, but you're not given any usernames or passwords to access it.
product review: the yellow GSA (Score:4, Informative)
The GSA will blindly search all web servers in your domain. When setting-up the GSA, you give it an initial page from which to start crawling and baseline domains. For example:
Inital page: http://www.slashdot.org/ [slashdot.org]
Domain(s):
The leading dot on the first domain entry says to search all hosts in the domain.
Problem: GSA does not provide very good status of where or what it is searching. It only has a dashboard light to say it is crawling. No details.
Problem: We found that the GSA would get caught in an endless loop if it encountered a user website controlled by a database. It would endlessly follow the next and previous links to find every database entry.
Our university library subscribes to a number of electronic databases, such as, EBSCO PsychINFO, etc. The GSA indexed every possible look-up.
Our eval licenses was limited to 1.5 million pages. Some of these databases contain hundreds of thousands of pages. Solution: Those setting up their own web server must employ proper robots.txt files or risk having their entire server blocked from indexing.
Why some places won't buy this (Score:5, Interesting)
I work for a large TLA govt agency. I've begged our people to get something like this. I know, from working with our folks and doing my own digging, that we have a wealth of knowledge tucked away, here and there, on local group shares and out-of-the-way internal web sites. And yet our internal search function is ludicrously bad. It works off "key words" that are simply a manually maintained (I think) list of useless, often off-the-mark descriptions of approved sites of general interest. Special-interest pages are not indexed in this way. The crawler, if you want to call it that, is terrible at doing its job. Enter a string of text and get a hit on a known, universally accessible web page containing that exact string? Not a chance. I test it occasionally and find that it remains as ridiculous as ever, with a level of functionality that would have been technologically uninteresting the better part of a decade ago but is, in this day, infuriating to users.
The reason for all this is that if our intranet were automatically crawled, well indexed, and truly searchable, people would be able to find things. People in Work Area A would be able to see how they might be impacted by something going on in Work Area B. Horrors! That would mean that management would lose much of their ability to keep employees selectively in the dark.
All this came to a head a number of years ago. At that time, our intranet content was maintained by IT. Anybody that wanted a site (literally anybody) could just get their first-line manager to approve the request and they'd get server space and some help setting up a page or two. The exchange of information that started happening was highly disruptive, so a "Communications and Liaison" office was set up that wrenched control of the intranet from IT and required (what seems to be essentially political) approval of the business case for anything that went online. No web sites unless the Communications gods approved.
Nowadays, the employees of one division are only vaguely aware that other divisions exist or have web sites. Each individual fiefdom is protected from the ravages of communications that don't strictly follow the org chart lines. I guess the executives in charge are happy in their insulated little worlds.
If you're going to sell an effective intranet search tool, you're going to have to face the fact that lots of large organization leaders (and you find the same attitudes in both the public and the private sector) would recoil in horror at the thought of having their intranet be effectively searchable. It's too threatening.
Re:Why some places won't buy this (Score:2, Interesting)
I set up a search for our intranet at my govt agency (one part of a larger cabinet agency) many years ago. For some reason I never understood, the one guy who controls the intranet site decided that the search link s
swish-e as a Google Mini alternative (Score:3, Informative)
In
Curious... (Score:2, Interesting)
Nice review (Score:2, Interesting)
One thing I wonder is that Google can probably use the included modem to download private company data which the ser
GPL? (Score:2)
Google Mini Support / Install. (Score:2, Interesting)
look beyond brand name for better alternatives (Score:2)
Google on Time4ink.com (Score:2, Insightful)
Re:Google on Time4ink.com (Score:2)
In the case of your application I would say it was a good call.
In the case of more content rich sites that may have varied types of articles as well as the desire to have a more intergrated look and feel the applicance is more neccassary.
There are also many intranets that have tons of content that is not available to the Net at large however the people who manage and use these networks would still like to be able to search the content they have on thier internal
Re:Google on Time4ink.com (Score:2)
Say you are a lawyer and have 10 years worth of electronic versions of communications on 50 different computers. You can buy a google appliance, configure it index everyone of those computers (you have to network share the drives in some way), and the "cache" link also works as a sort of backup. You don't want any jackass on the web searching that stuff.
Carpetting (Score:3, Funny)
I loath the appliance (Score:2)
It locked up for me waay to many times even though google cites this as rare. I wasted way to much time on support for a device which should not need this level of babysitting.
When my contract ends, I'm switching to Nutch.
Benchmarked: Google Appliance != Performance (Score:2, Informative)
Re:GPl compliance (Score:3, Informative)
Re:I tested it.... (Score:3, Interesting)
At anandtech's website,
to test the ability of their google search server,
I searched for the title of that article.
You would think it would point me to the article;
it did not.
Re:I tested it.... (Score:2)
Re:it's (Score:2)
Re:Better than Google (Score:2)
Re:It looks like the OS is WINDOWS (Score:2)
If you are.....I don't know how to respond.
Re:It looks like the OS is WINDOWS (Score:2)
Re:It looks like the OS is WINDOWS (Score:2, Insightful)
Re:How long before... (Score:2)