Google Index Doubles 324

Posted by samzenpus on Thursday November 11, 2004 @07:00AM from the even-more dept.

geekfiend writes "Today Google updated their website to indicate over eight billion pages crawled, cached and indexed. They've also added an entry to their blog explaining that they still have tons of work to do."

This discussion has been archived. No new comments can be posted.

Google Index Doubles

Load All Comments

Search 324 Comments Log In/Create an Account

Comments Filter:

Comment removed (Score:4, Funny)

by account_deleted ( 4530225 ) writes: on Thursday November 11, 2004 @07:04AM (#10785963)

Comment removed based on user account deletion

Share
twitter facebook
- slashdotting (Score:4, Funny)
  
  by Zork the Almighty ( 599344 ) writes: on Thursday November 11, 2004 @07:15AM (#10786007) Journal
  
  In case of slashdotting use this mirror [google.ca].
  
  Parent Share
  twitter facebook
  - Re:slashdotting (Score:3, Funny)
    
    by juglugs ( 652924 ) writes:
    
    No, no, no... Use this [alltooflat.com] Mirror...
  - Re:slashdotting (Score:2, Redundant)
    
    by flewp ( 458359 ) writes:
    
    You can always try the google cache [216.239.39.104] just in case too!
- Re:Image Search (Score:3, Interesting)
  
  by BoldAC ( 735721 ) writes:
  
  While waiting for the update to their image search, everybody should optimize their web pages... google-style.
  
  For those of you that don't believe that having keywords in your URLS... just use google's own story, for example.
  
  http://www.google.com/googleblog/2004/11/googles -i ndex-nearly-doubles.html
  
  "Google Index Nearly Doubles" is in the url and the first header. Look at how they do thinks... and your google traffic will increase.
More pages v.s more relevant pages (Score:5, Insightful)

by xiando ( 770382 ) writes: on Thursday November 11, 2004 @07:06AM (#10785970) Homepage Journal

Personally I find that the lack of relevant pages if the biggest problem with search engines, not the lack of pages with information. It seems I always find what I'm looking for eventually, what I need improved is the time I spend looking though spam-bomb pages before I find a page with the correct information.

These spam-pages seem to be increasing; I mean those pages with just a buch of keywords or the output of some search system.

Share
twitter facebook
- Re:More pages v.s more relevant pages (Score:5, Insightful)
  
  by Kithraya ( 34530 ) writes: on Thursday November 11, 2004 @07:51AM (#10786142)
  
  I'm especially irritated by the increasing number of highly-ranked pages that are nothing more than another search engine's results. If Google could find some way to identify and remove these from my result set, Google's usefulness to me would increase 10 times over.
  
  Parent Share
  twitter facebook
  - Re:More pages v.s more relevant pages (Score:3, Interesting)
    
    by metlin ( 258108 ) * writes:
    
    Google has a problem with this because some of those searches are actually useful.
    
    For instance, when I search for something technical, I often run into search results from DBLP, arXiv, CiteSeer and the like -- although these are really search results within themselves, they're immensely useful to me.
    
    Since we both effectively have a conflict of interest - Google would need to figure out a way to strike a balance.
    - Re:More pages v.s more relevant pages (Score:2, Interesting)
      
      by corrie ( 111769 ) writes:
      
      However, results from places like Starware Search are not useful, and elevates my blood pressure with all the attempts at spamming me.
      
      Just because I use Firefox and Adblock doesn't mean I now want to visit all possible spam sites in existence.
      
      I don't care if Starware and friends make their money from advertising or not. The point is that Google is ALREADY a search engine, and a pretty good one at that. What is the point of returning results from another search engine, especially if the other one does not
- Re:What? (Score:2, Insightful)
  
  by poohsuntzu ( 753886 ) writes:
  
  It isn't about having a better search engine, so much as it is knowing how to use it. If you are looking for information on a recipe for oriental rice using asian spice, how would you search?
  
  Bad search example:
  
  oriental rice recipe asian spice
  
  Good search example:
  
  recipe+"oriental rice"+spice
  
  See the difference? google tries its best to get rid of the spam pages, but it won't ever combat them all. Half of the work has to be done with you understanding the best way to describe to the search engin
  - Re:What? (Score:3, Interesting)
    
    by LiquidCoooled ( 634315 ) writes:
    
    I see the difference...
    
    Search terms: oriental rice recipe asian spice
    Search Results: Results 1 - 10 of about 254,000 for oriental rice recipe asian spice . (0.40 seconds)
    Search Effectiveness: REASONABLE. good list of relivent items matched.
    
    Search terms: recipe+"oriental rice"+spice
    Search Results: Your search - recipe+"oriental rice"+spice - did not match any documents.
    Search Effectiveness: UTTER SHITE
    
    The user wants SIMPLICITY. If google cannot give decent results for simple search criteria, then peopl
    - Re:What? (Score:5, Informative)
      
      by jez9999 ( 618189 ) writes: on Thursday November 11, 2004 @08:25AM (#10786261) Homepage Journal
      
      Erm, that's only because of the bizarre plus signs the grandparent poster put in - try this [google.com]. Note to grandparent: Just about any modern search engine assumes words not prefixed by anything are to be included in the Boolean search query. No need for +.
      
      Parent Share
      twitter facebook
  - - try +the (Score:2, Informative)
      
      by leuk_he ( 194174 ) writes:
      
      Yes there is, try to search for
      
      The Doctor
      
      vs
      
      +the doctor
    - Re:What? (Score:2)
      
      by gus goose ( 306978 ) writes:
      
      Hmmm... actually:
      http://www.google.com/help/refinesearch .html
      
      There IS a + operator, and you are modded "informative"....
      
      gus
- Proximity search will help (Score:3, Insightful)
  
  by Sai Babu ( 827212 ) writes:
  
  This is why I've been begging google folks to implement NEAR [pandia.com] operator!
  
  Here is an example msn search: http://search.msn.com/results.aspx?FORM=SMCRT&q=fi sh%20NEAR%20ahi%20NEAR%20recipe [msn.com]
- Re:More pages v.s more relevant pages (Score:3, Informative)
  
  by __aahlyu4518 ( 74832 ) writes:
  
  Personally I find that the lack of relevant pages if the biggest problem with search engines, not the lack of pages with information.
  
  Actually.... information IS relevant data. If it's not relevant to what you want, then it is just data...
- Re:More pages v.s more relevant pages (Score:5, Interesting)
  
  by jez9999 ( 618189 ) writes: on Thursday November 11, 2004 @08:30AM (#10786287) Homepage Journal
  
  One thing that would really help me sometimes would be if Google allowed you to do an 'exact match' search. No, I don't mean enclosing something in double quotes, that still ignores capitalization, whitespace, and most non-letter characters. I'd like to be able to search for pages that have the EXACT string '#windows EFNET', for example, or '/usr/bin/' or whatever. '/Usr/biN' wouldn't match, and nor would '#windows^^EFNET' (where ^ is equal to a space :-) ).
  
  I sent an e-mail to Google about this and the guy who replied didn't seem to think it was possible... anyone know if it is?
  
  Parent Share
  twitter facebook
  - Re:More pages v.s more relevant pages (Score:5, Insightful)
    
    by PsychoSlashDot ( 207849 ) writes: on Thursday November 11, 2004 @09:03AM (#10786413)
    
    What I've read on the Google help pages seems to indicate that they don't index punctuation or capitalization. When you search for something, your string is looked for within an existing index, and appropriate reference materials are shown. Including punctuation wouldn't result in any hits within their index, meaning no results.
    
    Now, obviously, it is theoretically possible to do just about anything. But in this case, with the architecture they have in place, anyone ever doing what you're asking would require a full-text search through their multi-TB dataset, which I suspect is highly impractical.
    
    My point is that as I understand it, Google has coded a number of shortcut tricks which allow reasonable search times, and full-text string-exact searching would prevent them from using those shortcuts, resulting in search times they don't seem to think is reasonable.
    
    Parent Share
    twitter facebook
    - Re:More pages v.s more relevant pages (Score:4, Interesting)
      
      by Erasmus Darwin ( 183180 ) writes: on Thursday November 11, 2004 @09:39AM (#10786617)
      
      "But in this case, with the architecture they have in place, anyone ever doing what you're asking would require a full-text search through their multi-TB dataset, which I suspect is highly impractical."
      Actually, they could cut that down considerably. For example, say we were doing an exact search for '#windows EFNET' as in the original example. The first thing they could do is start with a traditional search on "#windows EFNET" [google.com]. At that point, they've cut their multi-TB dataset down to just a few megs or less of likely matches (in this case, only 10 pages matched). Then they could do a full-text check on each result, looking for an exact match and discarding all the rest.
      
      Parent Share
      twitter facebook
  - Re:More pages v.s more relevant pages (Score:3, Insightful)
    
    by PMuse ( 320639 ) writes:
    
    How about a NEAR operator? Sure, AND OR NOT are nice, but my results would be a lot more relevant if I could eliminate results where the search terms appeared a thousand words apart.
- Re:More pages v.s more relevant pages (Score:2)
  
  by MoobY ( 207480 ) writes:
  
  The same goes for duplicate information. I don't want 200 versions of wikipedia listed when I'm looking for a specific article, nor 200 times the same man page when I'm researching something different of a unix command besides the man page of a command.
- - - Re:More pages v.s more relevant pages (Score:2)
      
      by fishbot ( 301821 ) writes:
      
      It's what we'd get if we printed everything linked to by Google :)
    - Re:More pages v.s more relevant pages (Score:2)
      
      by maxwell demon ( 590494 ) writes:
      
      The stuff you find in /usr/lib :-)
I'm all alone (Score:4, Funny)

by tcdk ( 173945 ) writes: on Thursday November 11, 2004 @07:06AM (#10785971) Homepage Journal

8 billion pages and not a single link to my blog [google.com].

Can't figure of I should just shoot my self or maybe just open a subscription to /.

Share
twitter facebook
- Re:I'm all alone (Score:4, Funny)
  
  by Zork the Almighty ( 599344 ) writes: on Thursday November 11, 2004 @07:11AM (#10785995) Journal
  
  If you shoot yourself, will your blog readers know ? I mean, it's kindof like the tree in the forest thing.
  
  Parent Share
  twitter facebook
- Re:I'm all alone (Score:3, Informative)
  
  by tadmas ( 770287 ) writes:
  
  8 billion pages and not a single link to my blog.
  
  Perhaps you should just tell them where it is [google.com].
- Re:I'm all alone (Score:2)
  
  by Temkin ( 112574 ) writes:
  
  You're not alone... They don't index my site either. :P
  - Re:I'm all alone (Score:2)
    
    by tcdk ( 173945 ) writes:
    
    My site is actually index'ed, they just don't index anybody, who links to me.
Do this affect how fresh their index will be? (Score:4, Insightful)

by Jugalator ( 259273 ) writes: on Thursday November 11, 2004 @07:07AM (#10785973) Journal

I wonder if it'll take longer to index twice as many pages? Or if they, along with this change, improved their spider and/or added hardware. Otherwise I'm not sure this change is for the better, unless you like to search for really obscure topics.

Share
twitter facebook
- Re:Do this affect how fresh their index will be? (Score:2)
  
  by andres32a ( 448314 ) writes:
  
  Actually no. Better search results means fewer necessary searches, which in turn will make the entire process most time effective. And anyway, you can`t just stop indexing webpages just because it might take longer to index them. You just need to improve on hardware or the technology itself.
  - Re:Do this affect how fresh their index will be? (Score:2)
    
    by Jugalator ( 259273 ) writes:
    
    Better search results means fewer necessary searches, which in turn will make the entire process most time effective.
    
    Search results? Are you talking about a person searching? I was mostly concerned about how quickly Google can update their complete index now that it doubled in size. I understand for my part it might get better, as long as the index is kept up-to-date.
    
    And anyway, you can`t just stop indexing webpages just because it might take longer to index them. You just need to improve on hardware or
What is new about this. (Score:4, Interesting)

by hanssprudel ( 323035 ) writes: on Thursday November 11, 2004 @07:07AM (#10785977)

What the article does not point out is why this something important. For just about forever google's store has been coverging on 2**32 documents. Some people have speculated that Google simply could not update their 100,000+ servers with a new system that allowed more. Apparently they have now done the necessary architecture changes to allow for identifying documents by 64 bit (or more identifiers) and back in the business of making their search for comprehensive.

Good timing to conincide with MSN attempt to start a new searchengine too!

Share
twitter facebook
- Re:What is new about this. (Score:4, Interesting)
  
  by Jugalator ( 259273 ) writes: on Thursday November 11, 2004 @07:16AM (#10786010) Journal
  
  Good timing to conincide with MSN attempt to start a new searchengine too!
  
  Yes, they'd better fight back, as they now have a serious competitor in MSN.
  It's giving very accurate results [msn.com].
  
  Doesn't anyone find it strange that Google gave the same top result there a while back?
  
  MSN must be using a very similar algorithm.
  
  Maybe a bit too similar...?
  
  *tinfoil hat on*
  
  Parent Share
  twitter facebook
  - Re:What is new about this. (Score:2)
    
    by Zork the Almighty ( 599344 ) writes:
    
    Ha! Google itself is #4 on MSN's results.
  - Re:What is new about this. (Score:2)
    
    by pchan- ( 118053 ) writes:
    
    heh, bedope.com. i haven't seen that site since Be Inc went under. the were the site to introduce the most numerically advanced version of linux, ever! [bedope.com]
    
    "You'll note that other versions of Linux are languishing at version 6.3 or even 2.2 - only Be Dope Linux Version 27.1 with AVN (Advanced Version Numbering) brings you a version of Linux numbered at 27.1".
  - Re:What is new about this. (Score:3, Insightful)
    
    by Jugalator ( 259273 ) writes:
    
    Wow, Microsoft must have fixed it...
    It now no longer shows microsoft.com as top hit.
    
    Haha, I guess the joke reached MS headquarters. :-P
    - Re:What is new about this. (Score:2)
      
      by BetterThanCaesar ( 625636 ) writes:
      
      Or maybe all the other pages mentioning "more evil than satan himself" got higher rank anyway. The same happened to the corresponding Google query.
      
      Wikipedia/Google bomb [wikipedia.org]:
      However, the first Google bomb mentioned in the popular press may have occurred accidentally in 1999, when users discovered that the query "more evil than Satan " returned Microsoft's home page. Now, it returns links to several news articles on the discovery.
      As you see on the MSN search page, the same is happening here. I doubt they've
  - Still works without the quotation marks .. (Score:2)
    
    by RedLaggedTeut ( 216304 ) writes:
    
    Well, it seems Microsoft has dropped to rank 5 in spiritual ranking [msn.com], should I sell my stock?
- Re:What is new about this. (Score:3, Insightful)
  
  by slavemowgli ( 585321 ) writes:
  
  I don't quite believe that Google would've limited themselves that way (using 32 bit identifiers for documents) - that would've been incredibly short-sighted.
  - Re:What is new about this. (Score:3, Interesting)
    
    by bighoov ( 605325 ) writes:
    
    Probably not short sighted, but rather an space and cpu efficiency issue. Space - If you have 64-bit doc ids, even if you index 2^48 documents you're still wasting 16 bits per stemmed word per document. CPU - dealing with 64-bit integers on 32-bit hardware usually involves multiple loads, and decreases what can fit in the hardware data caches.
- Re:What is new about this. (Score:4, Interesting)
  
  by Anonymous Coward writes: on Thursday November 11, 2004 @07:34AM (#10786082)
  
  For just about forever google's store has been coverging on 2**32 documents. Some people have speculated that Google simply could not update their 100,000+ servers with a new system that allowed more. Apparently they have now done the necessary architecture changes to allow for identifying documents by 64 bit (or more identifiers) and back in the business of making their search for comprehensive.
  As someone who routinely follows these things, I couldn't agree more with your statement. My company operates a number of sites, and over the past 6 months, we've seen an obvious trend. Sites with, say, 5000+ pages, which used to be entirely indexed in Google, gradually had pages lost from Google. A search for site:somesite.com would return 5000 results 6 months ago. 3 or 4 months ago, the same search gave maybe 1000 results. This month maybe 500 or 600. We were definitely of the opinion that Google's index was "maxxed out" and was dropping large portions of indexed sites in favor of attempting to index new sites.
  
  Now after seeing this story, I did a search and found literally all 5000+ pages are indexed once again. This is a huge step forward for webmasters everywhere. If your site had been slowly edged out of Google's index it's most likely back in its entirety now.
  
  Thanks G.
  
  Parent Share
  twitter facebook
- great but where are the .txt and directories? (Score:3, Informative)
  
  by js7a ( 579872 ) writes:
  
  Google won't be within reach of the pinnacle until they index .txt files, directory listings, and anonymous ftp sites.
  - Re:great but where are the .txt and directories? (Score:3, Informative)
    
    by geminidomino ( 614729 ) * writes:
    
    One out of 3 [google.com] ain't a bad start. Add a few more keywords to narrow down the google-crawling.
  - Re:great but where are the .txt and directories? (Score:2)
    
    by Lehk228 ( 705449 ) writes:
    
    they *Do* index directory listings just search for "index of"
no update on the images (Score:3, Informative)

by bvdbos ( 724595 ) writes: on Thursday November 11, 2004 @07:07AM (#10785980)

Unfortunately they didn't update [slashdot.org] the image-search [google.com] yet.

Share
twitter facebook
Google makes minor change to website - news at 11! (Score:3, Insightful)

by Sanity ( 1431 ) writes: on Thursday November 11, 2004 @07:08AM (#10785986) Homepage Journal

Does every minor Google or Apple related thing deserve a slashdot story? Can slashdot create a "Fanboy" section for insignificant stories advocating Google (with their software patent) and Apple (with their iTunes DRM)? That way I could filter them out more easily.

Share
twitter facebook
- Re:Google makes minor change to website - news at (Score:2)
  
  by timdorr ( 213400 ) * writes:
  
  Maybe it's just me, but I'd call the doubling of information available for me to search a pretty significant improvement. Especially when the last update was only a 1b increase ("only" is a relative term, of course...).
  - Re:Google makes minor change to website - news at (Score:2)
    
    by Zork the Almighty ( 599344 ) writes:
    
    The extra 3 billion pages are probably link farms.
- Re:Google makes minor change to website - news at (Score:2, Funny)
  
  by dotmike ( 829740 ) writes:
  
  At the same time, can Slashdot create a "Curmudgeon" section for those who like to gripe about the less than monumental significance of some story topics?
- Turn them off then. (Score:2)
  
  by aug24 ( 38229 ) writes:
  
  You can customise your page to only have stories in your interests, and Google is one of the story types.
  
  I'm moderating at the mo, and I'd have moderated you 'muppet', but I thought I'd be useful instead ;-)
  
  J.
  - Re:Turn them off then. (Score:2)
    
    by aug24 ( 38229 ) writes:
    
    Actually, no, it appears I'm the muppet. Google is one of the stories you have to have. I apologise!
    
    J.
- Re:Google makes minor change to website - news at (Score:2)
  
  by shrykk ( 747039 ) writes:
  
  You can block Apple stories in your user preferences page.
Quality - not quantity (Score:3, Insightful)

by seanyboy ( 587819 ) writes: on Thursday November 11, 2004 @07:10AM (#10785991)

Google needs to stop obsessing about the number of indexed pages, and start concentrating on the quality. Since pagerank was switched off, 2 out of 5 searches now seem to be jammed with pages full of nothing but random words and adverts. It's even more galling when the adverts are Google Ads. Much as I love Google, they're becoming increasingly less effective as a tool.

Share
twitter facebook
- Re:Quality - not quantity (Score:3, Funny)
  
  by Ingolfke ( 515826 ) writes:
  
  I agree search engines are so 1990. I rely exlusively on word of mouth to find websites. If Firefox would add a button to the toolbar that said 'Cool Sites', maybe with an icon of a pair of glasses, and have the button link to a webpage with links to the latest cool sites on the net, that would certainly be the end of Google and their 8 billion pages. Pah!
  - Re:Quality - not quantity (Score:2)
    
    by Beolach ( 518512 ) writes:
    
    That was actually how Yahoo! got started. A few of college drop-outs started making a webpage linking to their favorite sites... and their friends started going to it, and their friends' friends, and their friends' friends' friends... and then somebody offered to pay them to advertise on the site. And we ended up with this [yahoo.com].
    - Re:Quality - not quantity (Score:2)
      
      by grazzy ( 56382 ) writes:
      
      they also made the "cool pages" the parent is talking about.. which always has sucked bad :)
- Re:Quality - not quantity (Score:2, Insightful)
  
  by Onionesque ( 455220 ) writes:
  
  To paraphrase Churchill, Google is the worst system devised by the wit of man, except for all the others. Where else would you go? Yahoo? Hey, how about AltaVista?
  The problems faced by Google in their battle against the scumbags who would game the system are faced by every other search engine. Google, IMHO, handles them better.
  - - Re:Quality - not quantity (Score:2)
      
      by WhiteDragon ( 4556 ) writes:
      
      at the bottom of every search results page, there is a link that says, "Dissatisfied? Help us improve". I've clicked on it once or twice, when encountering a particularly spammed keyword and they have fixed it!
- Re:Quality - not quantity (Score:4, Informative)
  
  by dabadab ( 126782 ) writes: on Thursday November 11, 2004 @07:49AM (#10786134)
  
  "[i]Since pagerank was switched off[/i]"
  
  Since when is Pagerank switched off?
  
  Parent Share
  twitter facebook
  - Re:Quality - not quantity (Score:4, Interesting)
    
    by seanyboy ( 587819 ) writes: on Thursday November 11, 2004 @07:58AM (#10786169)
    
    My bad. I'd skimmed a few things on the web, and assumed that it had been switched off. Looks instead as though Google have changed how it works. See PageRank is dead [zawodny.com]. I need to investigate further.
    
    Parent Share
    twitter facebook
And I for one welcome... (Score:2, Funny)

by mu22le ( 766735 ) writes:

No, wait, they are our internet search overlords since, like, 1999?

Mhm to anonymous coward or not to anonymous coward?
Will moderators smack my karma below zero?
Nonsense. (Score:2, Funny)

by MadFarmAnimalz ( 460972 ) writes:

over eight billion pages crawled

You don't just go from 4 billion to 8 billion overnight.

They are probably just crawling the same 4 billion twice.
- - Re:RTFA (Score:2)
    
    by Tim C ( 15259 ) writes:
    
    It was a joke.
Makes you wonder... (Score:5, Insightful)

by manmanic ( 662850 ) writes: on Thursday November 11, 2004 @07:21AM (#10786029)

Does this mean that I've been missing a huge amount of important information until now? I'd just assumed that Google covered the entire relevant web but now it seems to cover the whole same amount again. My Google alerts [googlealert.com] also seem to have started producing a lot more results which suggest that a lot of these new pages are rated quite highly. Who knows how much more quality content on the web we're just not seeing?

Share
twitter facebook
- Re:Makes you wonder... (Score:5, Interesting)
  
  by jlar ( 584848 ) writes: on Thursday November 11, 2004 @07:29AM (#10786061)
  
  "Does this mean that I've been missing a huge amount of important information until now?"
  
  Maybe the steep increase is due to all the new file formats they are indexing now. That might be useful for some people (although I sometimes find it kind of annoying that a search returns MS-Word documents).
  
  Parent Share
  twitter facebook
  - Re:Makes you wonder... (Score:3, Informative)
    
    by RedWizzard ( 192002 ) writes:
    
    Maybe the steep increase is due to all the new file formats they are indexing now.
    
    The steep increase is probably due to an architecture change. Google has, for a long time, been indexing around 4 billion pages. That implies that they have been giving each page a 32 bit unique identifier, and had exhausted that id space. It would be a lot of work for them to seamlessly upgrade all their software to support a larger id, and it has taken them a long time to do so. Now that they have the large jump in page
- Advertising a deficiency (Score:2)
  
  by fleener ( 140714 ) writes:
  
  When a search engine announces it has increased its index of pages, it advertises a deficiency....
  "Oh, if you just added several billion pages, were you giving me crap before? How many more billions of pages are you not indexing right now?"
  Google's announcement merely gives its users reason to question the size and comprehensiveness of Google's index.
Google needs your cookie badly (Score:2, Informative)

by Anonymous Coward writes:

Until today you could save your google settings [google-watch.org] without loosing your privacy [google-watch.org]. You can still save those settings but google refuses to use them when you block their cookie. In my case I get 10 search results although I like to receive 100. Seems that they are making many dollars on a user's cookie, and now they are a public company my privacy is less important than "stock holders' interests".
- Re:Google needs your cookie badly (Score:3, Informative)
  
  by Anonymous Coward writes:
  
  You can still save those settings but google refuses to use them when you block their cookie. In my case I get 10 search results although I like to receive 100.
  Create a keyword bookmark [mozilla.org] with the URL
  http://www.google.com/search?q=%s&num=100 [google.com]
  
  Give it the keyword 100, then type 100 search_term in the address bar to use it.
Google domination. (Score:2, Informative)

by Anonymous Coward writes:

Local tabloid Aftonbladet is running a poll on search engine use:

Google (81.4 %)
Yahoo (2.2 %)
MSN (3.8 %)
Other (11.4 %)
Don't know (1.2 %)

61730 votes so far.

I'm a little surprised, either the masses who use the "default" (MSN?) aren't bothering to answer, or google is simply very very dominant and those "default using masses" do not exist [in this country].
- Re:Google domination. (Score:3, Insightful)
  
  by Mostly a lurker ( 634878 ) writes:
  
  the masses who use the "default" (MSN?) aren't bothering to answer
  I think it is more that many users of IE just do not twig that their failed page access resulted in an automatic query to MSN.
  In reality, most users make occasional deliberate queries to Google and more frequent accidental queries to MSN.
If I kept eating so much spam... (Score:2, Funny)

by dos_dude ( 521098 ) writes:

... my weight would probably double, too.
Microsoft (Score:4, Interesting)

by Cookeisparanoid ( 178680 ) writes: on Thursday November 11, 2004 @07:38AM (#10786099) Homepage

A lot of people have been asking what the point of the artical is, why does it matter, well possibly because Microsoft announced the launch of their search engine http://news.bbc.co.uk/1/hi/technology/4000015.stm and are claiming more pages index than google (5 billion) so google have responded by effectivly doubling their pages indexed.

Share
twitter facebook
8 billions.... (Score:2, Funny)

by DrYak ( 748999 ) writes:

Of which 80% is V1AGR@ advertising,
and 19% is pr0n.
There's debate if the remaining 1% contains pirated music and movie or plans for DIY nukes.
Mine is bigger than yours!!! (Score:5, Informative)

by ayjay29 ( 144994 ) writes: on Thursday November 11, 2004 @07:46AM (#10786124)

From BBC News here [bbc.co.uk].

In a statement Microsoft said its search engine returned results from five billion web pages - more than any other search engine.

But this quickly won a response from Google which announced that its index has now grown to more than 8 billion pages.

Prior to the Microsoft announcement, Google was only indexing 4,285,199,774 web pages.

Steve Ballmer is soon to announce that his daddy is one hundrad years old, and kan kick your daddy's ass...

Share
twitter facebook
Grrrrr (Score:4, Funny)

by squoozer ( 730327 ) writes: on Thursday November 11, 2004 @07:47AM (#10786128)

Now it's going to be even harder to get my name in the top spot. Why was I cursed with the surname Smith!

Share
twitter facebook
Searching LiveJournal.com (Score:5, Informative)

by hackrobat ( 467625 ) writes: <manish...jethani@@@gmail...com> on Thursday November 11, 2004 @07:49AM (#10786135) Homepage

Looks like they've added a gazillion LiveJournal [livejournal.com] pages to their index. I used to have a Google search box on my LJ that didn't throw up relevant results until last week or so. Now it works perfectly, just like builtin search (like what you see in MT and WordPress).

Share
twitter facebook
Doubled? Wait a minute... (Score:5, Funny)

by 't is DjiM ( 801555 ) writes: on Thursday November 11, 2004 @07:50AM (#10786140)

From 4 to 8 billion pages... I guess they just indexed the google cache...

Share
twitter facebook
Competing with Microsoft's 5bn? (Score:5, Informative)

by Richard W.M. Jones ( 591125 ) writes: <rich&annexia,org> on Thursday November 11, 2004 @07:51AM (#10786143) Homepage

On the same day that this story hits the BBC [bbc.co.uk]. In that story Microsoft claim that they have 5 billion pages indexed, more than the 4.2 billion pages indexed (at that point) by Google. The BBC have just updated the story with the 8bn figure.
I smell competition!
Rich.

Share
twitter facebook
Does this mean...? (Score:4, Insightful)

by jimicus ( 737525 ) writes: on Thursday November 11, 2004 @07:51AM (#10786147)

Does this mean twice as many pages with "Search for 'printer problem linux' on Kelkoo"?

Share
twitter facebook
- Re:Does this mean...? (Score:2)
  
  by mikael ( 484 ) writes:
  
  Probably in the same way that Daylight Savings Time gives you an extra hour of sunlight each day.
meta-no-archive (Score:3, Interesting)

by Anonymous Coward writes: on Thursday November 11, 2004 @07:54AM (#10786154)

apparently my sites will never get a good ranking on google because I don't want the search engine to cache the site. So I'm using meta no-archive tags. That's the only thing I can figure out why the sites rank so poorly on google, when they come up in the top 10-20 hits on yahoo and other search engines. The keywords for the searches are valid, the sites are relevant to the keyword searches, yet the sites don't show in the top 100-300 on google.

I've avoided all the usual spam type of tags (auto refreshing, hidden text, cloaking, etc.) and the sites are legitimate and on the up and up, and yet the only page or two that google is spidering are the one or few that appear to be without the no-archive tags and possibly the revisit/expire tags.

Is google's policy, allows us to cache your site, or get penalized? Anyone else run into a similar problem or can shed some light on this? The only other thing I can think of is the robots text file, that keeps googlebot, and then other spiders through a *, from entering images directories. The spiders, including googlebot, aren't restricted from entering any other directories, they are given free reign.

Anyone else with problems with no-cache, no archive, tight revisit/expire times, or similar non-spam tags that result in penalties in google ranking?

I've been using google exclusively for a few years now. But the poor page ranking of sites on my server got me wondering about other sites that may be relevant to my own searches which may be exluded or penalized by google. So I've started using Yahoo search again, as much as I hate Yahoo (what they do with advertising to Yahoo groups and Yahoo mail is a shame). It appears that Yahoo is including better results because other sites show up with higher ranking that actually are relevant. So I've learned that Google isn't as perfect as I thought it was, which was disappointing in itself. It was easy using one search site. Now I have to use two to make sure I'm getting good results. Anyone know if there is a plugin for Firefox with both Google and Yahoo search boxes on the toolbar?

Share
twitter facebook
- - Re:meta-no-archive (Score:3, Informative)
    
    by justMichael ( 606509 ) writes:
    
    I'm sure I've seen some way of doing a sort of backwards search on a page, that will show all the pages in Google that link to it.
    The search you are looking for looks like this: link:slashdot.org [google.com]
Closer to re-adding entries (Score:2)

by Gambit Thirty-Two ( 4665 ) writes:

I regularly watch where my nickname, full name, parents names, etc come up in google. I've noticed in the past couple of months, my hits have DRASTICALLY reduced. They just disapeared from the database. But over the past 2 days, I've gotten notifications (thanks google alerts) about new pages being indexed and voila! They come up in a search again.
Just tried the beta of the new MSN Search (Score:4, Funny)

by Mostly a lurker ( 634878 ) writes: on Thursday November 11, 2004 @08:29AM (#10786283)

I received this response:

This site is temporarily unavailable, please check back soon.
Didn't get the results you expected? Help us improve.

It is not clear to me how I can help them improve. Suggest they switch their servers to Linux?

Share
twitter facebook
I know why it has doubled... (Score:3, Interesting)

by jmcmunn ( 307798 ) writes: on Thursday November 11, 2004 @09:23AM (#10786531)

Because every blogger in the universe has added at least 3 pages since the last index. I fail to see how it is significant to me that there are now 8 billion mostly worthless sites out there. The number of actually useful sites has not gone up considerably.

Share
twitter facebook
- Re:This is news ? (Score:2, Funny)
  
  by Manip ( 656104 ) writes:
  
  Yes because we at /. love Google..
  
  Google is a constant source of information and a geeks friend - if the index has doubled so has our supply of information. Information rules!
  - Re:This is news ? (Score:2)
    
    by Zork the Almighty ( 599344 ) writes:
    
    In related news, the sun has set for today and will rise again tomorrow. The web is growing. Google is indexing it. It isn't news, it's a factoid.
    - Re:This is news ? (Score:3, Insightful)
      
      by dotmike ( 829740 ) writes:
      
      Yeah, but it'd be news if the sun set twice in one night or rose twice as bright.
      
      It's more the exponential increase in the size of the index rather than the piecemeal addition.
- - Re:This is news ? (Score:3, Interesting)
    
    by PerpetualMotion ( 550623 ) writes:
    
    A bigger index does not equal better search results, however, with the press this will generate, it will equal profits.
    - Re:This is news ? (Score:3, Interesting)
      
      by Ford Prefect ( 8777 ) writes:
      
      A bigger index does not equal better search results, however, with the press this will generate, it will equal profits.
      
      It would be terribly easy to get trillions of pages indexed. For instance, a site I've been working on has a public calendar system, with results fished out of a database. There are very few actual events in it at the moment, but with the 'Previous' and 'Next' links it'll run from 1970 to 2038. A naïve web-crawler would index every single month for every single year, but Google would
- - Geeks who understand marketing (Score:2)
    
    by Mostly a lurker ( 634878 ) writes:
    
    What Google has going for them is that they combine technical know how with marketing smarts. I still use Google as my primary search engine because it produces better results. Google understands though that, in the market at large, they need to play the numbers game. Fine they say. Within hours of the Microsoft announcement, out comes this.
    Frankly, I love it any time someone can best Microsoft. The next big thing may well be consumers putting their data on servers provided by the likes of Google, Mi
- Re:Google Schmoogle (Score:2, Funny)
  
  by seanyboy ( 587819 ) writes:
  
  They already have. [google.com]
- Re:Google thieves my bandwidth (Score:5, Informative)
  
  by Anonymous Coward writes: on Thursday November 11, 2004 @07:35AM (#10786089)
  
  Google respects the robots.txt file. Use it.
  
  Parent Share
  twitter facebook
- robots.txt (Score:4, Informative)
  
  by ReKleSS ( 749007 ) writes: <rekless@NOspAm.fastmail.fm> on Thursday November 11, 2004 @07:51AM (#10786148)
  
  Yes, this is probably a troll, but anyway... I take it you've never heard of the robots.txt file? You sound like you might want to read up on it. It's designed to help control the spidering of your pages for whatever reason, particularly cases like yours or situations where a spider would get confused and end up doing something stupid (recursive stuff, etc).
  -ReK
  
  Parent Share
  twitter facebook
- Re:Google thieves my bandwidth (Score:2, Insightful)
  
  by Rakshasa Taisab ( 244699 ) writes:
  
  You can rant all you want, but Google still has a fair use right to your images. They are reduced resolution images and therefor legal for non-commercial use.
  
  Not to mention robot.txt, but that is so obvious it shouldn't need to be mention.
- Re:Google thieves my bandwidth (Score:5, Informative)
  
  by jvj24601 ( 178471 ) writes: on Thursday November 11, 2004 @08:09AM (#10786196)
  
  Well, if you know that Google is indexing your site and "stealing" your bandwidth, then you must have looked at the server logs, right? You'd see the name of the search bot is googlebot. Search for it [google.com], and you'll find that the first relevant link [google.com] explains how to prevent googlebot from accessing your site.
  
  The logs would probably also show failed attempts to find the file /robots.txt. Similar info is gained from searching on that term [google.com] as well.
  
  Parent Share
  twitter facebook
- So, to sum up... (Score:5, Insightful)
  
  by kahei ( 466208 ) writes: on Thursday November 11, 2004 @08:15AM (#10786225) Homepage
  
  I am feeding this troll because there are people who really _do_ think like that and I wish I could yell at them to their faces :)
  
  You put content in a place where it is publically accessible. You explicitly and proactively made that content available to everyone, including 'the average surfer' and googlebots. You took no steps to make it available only to the select few of whom you approve.
  
  Now you are all cross and bothered because average surfers / googlebots have read / copied your content, such as it is.
  
  The solution is to drown yourself in a bucket. I have a bucket.
  
  Parent Share
  twitter facebook
  - - Re:Read more carefully. (Score:5, Insightful)
      
      by Mant ( 578427 ) writes: on Thursday November 11, 2004 @11:18AM (#10787695) Homepage
      
      Robots.txt isn't some thing that only applies to Google, it is (supposed) to be honoured by all search engines, and uses the Robots Exclusion Standard. So, when you claim these are Google's arbitary rules, you are in fact wrong. They are neither Google's nor arbitary (at least no more than any web standard).
      
      So your clue, not so much of clue, as robots.txt doesn't fit your description.
      
      As for why you should know about it, you are putting up a web site, it is part of running a web site. You might as well complain why you need to know about HTML, CSS or registering a domain name. Quite what coming from the UK has to do with it (something I also do), I have no idea.
      
      "I simply do not want the average surfer to be able to visit my site, I am not interested in serving my pages to them, they simply would not appreciate or understand what it is I am showing."
      
      Then a publicly accessable webiste is the wrong place. It is not your personal space, and it isn't private. You made it available to the world, nobody made you. To turn around and complain when (some of) the world visits it is hypocracy.
      
      It's like putting up posters around a town, then running around complaining all these people are looking at them, won't appreciate them, and you don't want them too. It's also comes across as condescending and arrogant, which probably explains the nastiness of some of the responses.
      
      You opted in when you put up the publicly accessable website. If all search engines had to be opt in, nobody could find anything on the web, and it would use a lot of its utility. Your assumed to want them crawling becuase the vast majority of people do, they want their site to be found. If you don't though, no problem, just use the standards for stopping searches, or password protect the site. No scandal at all, just hysterics.
      
      Showing the low res thumbnail of your image isn't violating your copyright either. The only legitimate claim you have is the amount of time it took to remove something from the cache.
      
      The "thieves" accusation is even more ridiculous. If you put something up on the web people can see for free, you can't complain. There are options if you want to protect it. Google doesn't claim you work as theirs (which would be 'stealing' or at least copyright violation), they help people find you public web site.
      
      If you don't want a public website but made one, whose fault is that? If you are going to run a website and can't be bothered to find out how to do it properly, you can't blame Google.
      
      Parent Share
      twitter facebook
- Re:Ofcourse ... (Score:2)
  
  by loyalsonofrutgers ( 736778 ) writes:
  
  You have a mirror of the internet? That must be the one that George Bush uses.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Comment removed (Score:4, Funny)

slashdotting (Score:4, Funny)

Re:slashdotting (Score:3, Funny)

Re:slashdotting (Score:2, Redundant)

Re:Image Search (Score:3, Interesting)

More pages v.s more relevant pages (Score:5, Insightful)

Re:More pages v.s more relevant pages (Score:5, Insightful)

Re:More pages v.s more relevant pages (Score:3, Interesting)

Re:More pages v.s more relevant pages (Score:2, Interesting)

Re:What? (Score:2, Insightful)

Re:What? (Score:3, Interesting)

Re:What? (Score:5, Informative)

try +the (Score:2, Informative)

Re:What? (Score:2)

Proximity search will help (Score:3, Insightful)

Re:More pages v.s more relevant pages (Score:3, Informative)

Re:More pages v.s more relevant pages (Score:5, Interesting)

Re:More pages v.s more relevant pages (Score:5, Insightful)

Re:More pages v.s more relevant pages (Score:4, Interesting)

Re:More pages v.s more relevant pages (Score:3, Insightful)

Re:More pages v.s more relevant pages (Score:2)

Re:More pages v.s more relevant pages (Score:2)

Re:More pages v.s more relevant pages (Score:2)

I'm all alone (Score:4, Funny)

Re:I'm all alone (Score:4, Funny)

Re:I'm all alone (Score:3, Informative)

Re:I'm all alone (Score:2)

Re:I'm all alone (Score:2)

Do this affect how fresh their index will be? (Score:4, Insightful)

Re:Do this affect how fresh their index will be? (Score:2)

Re:Do this affect how fresh their index will be? (Score:2)

What is new about this. (Score:4, Interesting)

Re:What is new about this. (Score:4, Interesting)

Re:What is new about this. (Score:2)

Re:What is new about this. (Score:2)

Re:What is new about this. (Score:3, Insightful)

Re:What is new about this. (Score:2)

Still works without the quotation marks .. (Score:2)

Re:What is new about this. (Score:3, Insightful)

Re:What is new about this. (Score:3, Interesting)

Re:What is new about this. (Score:4, Interesting)

great but where are the .txt and directories? (Score:3, Informative)

Re:great but where are the .txt and directories? (Score:3, Informative)

Re:great but where are the .txt and directories? (Score:2)

no update on the images (Score:3, Informative)

Google makes minor change to website - news at 11! (Score:3, Insightful)

Re:Google makes minor change to website - news at (Score:2)

Re:Google makes minor change to website - news at (Score:2)

Re:Google makes minor change to website - news at (Score:2, Funny)

Turn them off then. (Score:2)

Re:Turn them off then. (Score:2)

Re:Google makes minor change to website - news at (Score:2)

Quality - not quantity (Score:3, Insightful)

Re:Quality - not quantity (Score:3, Funny)

Re:Quality - not quantity (Score:2)

Re:Quality - not quantity (Score:2)

Re:Quality - not quantity (Score:2, Insightful)

Re:Quality - not quantity (Score:2)

Re:Quality - not quantity (Score:4, Informative)

Re:Quality - not quantity (Score:4, Interesting)

And I for one welcome... (Score:2, Funny)

Nonsense. (Score:2, Funny)

Re:RTFA (Score:2)

Makes you wonder... (Score:5, Insightful)

Re:Makes you wonder... (Score:5, Interesting)

Re:Makes you wonder... (Score:3, Informative)

Advertising a deficiency (Score:2)

Google needs your cookie badly (Score:2, Informative)

Re:Google needs your cookie badly (Score:3, Informative)

Google domination. (Score:2, Informative)

Re:Google domination. (Score:3, Insightful)

If I kept eating so much spam... (Score:2, Funny)

Microsoft (Score:4, Interesting)

8 billions.... (Score:2, Funny)

Mine is bigger than yours!!! (Score:5, Informative)

Grrrrr (Score:4, Funny)

Searching LiveJournal.com (Score:5, Informative)

Doubled? Wait a minute... (Score:5, Funny)

Competing with Microsoft's 5bn? (Score:5, Informative)

Does this mean...? (Score:4, Insightful)