Google Index Doubles

Slashdot is powered by your submissions, so send in your scoop

Google Index Doubles 324

Posted by samzenpus on Thursday November 11, 2004 @07:00AM from the even-more dept.

geekfiend writes "Today Google updated their website to indicate over eight billion pages crawled, cached and indexed. They've also added an entry to their blog explaining that they still have tons of work to do."

This discussion has been archived. No new comments can be posted.

Google Index Doubles

Search 324 Comments Log In/Create an Account

Comments Filter:

More pages v.s more relevant pages (Score:5, Insightful)

by xiando ( 770382 ) writes: on Thursday November 11, 2004 @07:06AM (#10785970) Homepage Journal

Personally I find that the lack of relevant pages if the biggest problem with search engines, not the lack of pages with information. It seems I always find what I'm looking for eventually, what I need improved is the time I spend looking though spam-bomb pages before I find a page with the correct information.

These spam-pages seem to be increasing; I mean those pages with just a buch of keywords or the output of some search system.

Share
twitter facebook
Do this affect how fresh their index will be? (Score:4, Insightful)

by Jugalator ( 259273 ) writes: on Thursday November 11, 2004 @07:07AM (#10785973) Journal

I wonder if it'll take longer to index twice as many pages? Or if they, along with this change, improved their spider and/or added hardware. Otherwise I'm not sure this change is for the better, unless you like to search for really obscure topics.

Share
twitter facebook
Google makes minor change to website - news at 11! (Score:3, Insightful)

by Sanity ( 1431 ) writes: on Thursday November 11, 2004 @07:08AM (#10785986) Homepage Journal

Does every minor Google or Apple related thing deserve a slashdot story? Can slashdot create a "Fanboy" section for insignificant stories advocating Google (with their software patent) and Apple (with their iTunes DRM)? That way I could filter them out more easily.

Share
twitter facebook
Quality - not quantity (Score:3, Insightful)

by seanyboy ( 587819 ) writes: on Thursday November 11, 2004 @07:10AM (#10785991)

Google needs to stop obsessing about the number of indexed pages, and start concentrating on the quality. Since pagerank was switched off, 2 out of 5 searches now seem to be jammed with pages full of nothing but random words and adverts. It's even more galling when the adverts are Google Ads. Much as I love Google, they're becoming increasingly less effective as a tool.

Share
twitter facebook
Re:This is news ? (Score:3, Insightful)

by dotmike ( 829740 ) writes: on Thursday November 11, 2004 @07:18AM (#10786018)

Yeah, but it'd be news if the sun set twice in one night or rose twice as bright.

It's more the exponential increase in the size of the index rather than the piecemeal addition.

Parent Share
twitter facebook
Makes you wonder... (Score:5, Insightful)

by manmanic ( 662850 ) writes: on Thursday November 11, 2004 @07:21AM (#10786029)

Does this mean that I've been missing a huge amount of important information until now? I'd just assumed that Google covered the entire relevant web but now it seems to cover the whole same amount again. My Google alerts [googlealert.com] also seem to have started producing a lot more results which suggest that a lot of these new pages are rated quite highly. Who knows how much more quality content on the web we're just not seeing?

Share
twitter facebook
Re:Quality - not quantity (Score:2, Insightful)

by Onionesque ( 455220 ) writes: <spammie@pobox.com> on Thursday November 11, 2004 @07:21AM (#10786032) Homepage

To paraphrase Churchill, Google is the worst system devised by the wit of man, except for all the others. Where else would you go? Yahoo? Hey, how about AltaVista?
The problems faced by Google in their battle against the scumbags who would game the system are faced by every other search engine. Google, IMHO, handles them better.

Parent Share
twitter facebook
Re:What is new about this. (Score:3, Insightful)

by slavemowgli ( 585321 ) writes: on Thursday November 11, 2004 @07:23AM (#10786041) Homepage

I don't quite believe that Google would've limited themselves that way (using 32 bit identifiers for documents) - that would've been incredibly short-sighted.

Parent Share
twitter facebook
Re:More pages v.s more relevant pages (Score:5, Insightful)

by Kithraya ( 34530 ) writes: on Thursday November 11, 2004 @07:51AM (#10786142)

I'm especially irritated by the increasing number of highly-ranked pages that are nothing more than another search engine's results. If Google could find some way to identify and remove these from my result set, Google's usefulness to me would increase 10 times over.

Parent Share
twitter facebook
Does this mean...? (Score:4, Insightful)

by jimicus ( 737525 ) writes: on Thursday November 11, 2004 @07:51AM (#10786147)

Does this mean twice as many pages with "Search for 'printer problem linux' on Kelkoo"?

Share
twitter facebook
Re:What? (Score:2, Insightful)

by poohsuntzu ( 753886 ) writes: on Thursday November 11, 2004 @07:58AM (#10786173) Homepage

It isn't about having a better search engine, so much as it is knowing how to use it. If you are looking for information on a recipe for oriental rice using asian spice, how would you search?

Bad search example:

oriental rice recipe asian spice

Good search example:

recipe+"oriental rice"+spice

See the difference? google tries its best to get rid of the spam pages, but it won't ever combat them all. Half of the work has to be done with you understanding the best way to describe to the search engine, what it is you want to do. The better you explain it, the better it can search for you.

Parent Share
twitter facebook
Re:Google thieves my bandwidth (Score:2, Insightful)

by Rakshasa Taisab ( 244699 ) writes: on Thursday November 11, 2004 @08:03AM (#10786182) Homepage

You can rant all you want, but Google still has a fair use right to your images. They are reduced resolution images and therefor legal for non-commercial use.

Not to mention robot.txt, but that is so obvious it shouldn't need to be mention.

Parent Share
twitter facebook
Re:Google domination. (Score:3, Insightful)

by Mostly a lurker ( 634878 ) writes: on Thursday November 11, 2004 @08:06AM (#10786190)

the masses who use the "default" (MSN?) aren't bothering to answer
I think it is more that many users of IE just do not twig that their failed page access resulted in an automatic query to MSN.
In reality, most users make occasional deliberate queries to Google and more frequent accidental queries to MSN.

Parent Share
twitter facebook
So, to sum up... (Score:5, Insightful)

by kahei ( 466208 ) writes: on Thursday November 11, 2004 @08:15AM (#10786225) Homepage

I am feeding this troll because there are people who really _do_ think like that and I wish I could yell at them to their faces :)

You put content in a place where it is publically accessible. You explicitly and proactively made that content available to everyone, including 'the average surfer' and googlebots. You took no steps to make it available only to the select few of whom you approve.

Now you are all cross and bothered because average surfers / googlebots have read / copied your content, such as it is.

The solution is to drown yourself in a bucket. I have a bucket.

Parent Share
twitter facebook
Proximity search will help (Score:3, Insightful)

by Sai Babu ( 827212 ) writes: on Thursday November 11, 2004 @08:19AM (#10786239) Homepage

This is why I've been begging google folks to implement NEAR [pandia.com] operator!

Here is an example msn search: http://search.msn.com/results.aspx?FORM=SMCRT&q=fi sh%20NEAR%20ahi%20NEAR%20recipe [msn.com]

Parent Share
twitter facebook
Re:What is new about this. (Score:3, Insightful)

by Jugalator ( 259273 ) writes: on Thursday November 11, 2004 @08:24AM (#10786257) Journal

Wow, Microsoft must have fixed it...
It now no longer shows microsoft.com as top hit.

Haha, I guess the joke reached MS headquarters. :-P

Parent Share
twitter facebook
Re:More pages v.s more relevant pages (Score:5, Insightful)

by PsychoSlashDot ( 207849 ) writes: on Thursday November 11, 2004 @09:03AM (#10786413)

What I've read on the Google help pages seems to indicate that they don't index punctuation or capitalization. When you search for something, your string is looked for within an existing index, and appropriate reference materials are shown. Including punctuation wouldn't result in any hits within their index, meaning no results.

Now, obviously, it is theoretically possible to do just about anything. But in this case, with the architecture they have in place, anyone ever doing what you're asking would require a full-text search through their multi-TB dataset, which I suspect is highly impractical.

My point is that as I understand it, Google has coded a number of shortcut tricks which allow reasonable search times, and full-text string-exact searching would prevent them from using those shortcuts, resulting in search times they don't seem to think is reasonable.

Parent Share
twitter facebook
Re:Read more carefully. (Score:5, Insightful)

by Mant ( 578427 ) writes: on Thursday November 11, 2004 @11:18AM (#10787695) Homepage

Robots.txt isn't some thing that only applies to Google, it is (supposed) to be honoured by all search engines, and uses the Robots Exclusion Standard. So, when you claim these are Google's arbitary rules, you are in fact wrong. They are neither Google's nor arbitary (at least no more than any web standard).

So your clue, not so much of clue, as robots.txt doesn't fit your description.

As for why you should know about it, you are putting up a web site, it is part of running a web site. You might as well complain why you need to know about HTML, CSS or registering a domain name. Quite what coming from the UK has to do with it (something I also do), I have no idea.

"I simply do not want the average surfer to be able to visit my site, I am not interested in serving my pages to them, they simply would not appreciate or understand what it is I am showing."

Then a publicly accessable webiste is the wrong place. It is not your personal space, and it isn't private. You made it available to the world, nobody made you. To turn around and complain when (some of) the world visits it is hypocracy.

It's like putting up posters around a town, then running around complaining all these people are looking at them, won't appreciate them, and you don't want them too. It's also comes across as condescending and arrogant, which probably explains the nastiness of some of the responses.

You opted in when you put up the publicly accessable website. If all search engines had to be opt in, nobody could find anything on the web, and it would use a lot of its utility. Your assumed to want them crawling becuase the vast majority of people do, they want their site to be found. If you don't though, no problem, just use the standards for stopping searches, or password protect the site. No scandal at all, just hysterics.

Showing the low res thumbnail of your image isn't violating your copyright either. The only legitimate claim you have is the amount of time it took to remove something from the cache.

The "thieves" accusation is even more ridiculous. If you put something up on the web people can see for free, you can't complain. There are options if you want to protect it. Google doesn't claim you work as theirs (which would be 'stealing' or at least copyright violation), they help people find you public web site.

If you don't want a public website but made one, whose fault is that? If you are going to run a website and can't be bothered to find out how to do it properly, you can't blame Google.

Parent Share
twitter facebook
Re:More pages v.s more relevant pages (Score:3, Insightful)

by PMuse ( 320639 ) writes: on Thursday November 11, 2004 @01:38PM (#10789424)

How about a NEAR operator? Sure, AND OR NOT are nice, but my results would be a lot more relevant if I could eliminate results where the search terms appeared a thousand words apart.

Parent Share
twitter facebook
Re:Searching LiveJournal.com (Score:2, Insightful)

by cavemanf16 ( 303184 ) writes: on Thursday November 11, 2004 @02:17PM (#10789894) Homepage Journal

MSN's "msnbot" has been crawling/spidering my webserver (which runs Geeklog and is just another blog of my random crap) pretty extensively for weeks now. (Lie 5 times a day it seems) Searching on Google for my site's name now reveals more results from my site, but not a lot of those circle-jerk style search results pages that are just trying to generate some ad revenues. However, using the beta.search.msn.com site DOES yield a lot more random crap (mostly blogs and personal webservers) that somehow generated some kind of link to my site because of the title of one of my articles, someone linking to my site in one of their blog posts, etc.

I have a feeling MSN's new search site is gonna be mostly blogs and advertisements, not relevant information. I think it's good Google has indexed more pages, but I still believe their algorithm will continue to provide more USEFUL results than MSN. (BTW, the googlebot doesn't hit my site too frequently which tells me Google's bot understands that my site isn't updated too frequently, nor is it linked to from other important sites)

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Google Index Doubles 324

Google Index Doubles More Login

Google Index Doubles

More pages v.s more relevant pages (Score:5, Insightful)

Do this affect how fresh their index will be? (Score:4, Insightful)

Google makes minor change to website - news at 11! (Score:3, Insightful)

Quality - not quantity (Score:3, Insightful)

Re:This is news ? (Score:3, Insightful)

Makes you wonder... (Score:5, Insightful)

Re:Quality - not quantity (Score:2, Insightful)

Re:What is new about this. (Score:3, Insightful)

Re:More pages v.s more relevant pages (Score:5, Insightful)

Does this mean...? (Score:4, Insightful)

Re:What? (Score:2, Insightful)

Re:Google thieves my bandwidth (Score:2, Insightful)

Re:Google domination. (Score:3, Insightful)

So, to sum up... (Score:5, Insightful)

Proximity search will help (Score:3, Insightful)

Re:What is new about this. (Score:3, Insightful)

Re:More pages v.s more relevant pages (Score:5, Insightful)

Re:Read more carefully. (Score:5, Insightful)

Re:More pages v.s more relevant pages (Score:3, Insightful)

Re:Searching LiveJournal.com (Score:2, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot