Google URL Index Hits 1 Trillion 249
mytrip points out news that Google's index of unique URLs has reached a milestone: one trillion. Google's blog provides some more information, noting,
"The first Google index in 1998 already had 26 million pages, and by 2000 the Google index reached the one billion mark. Over the last eight years, we've seen a lot of big numbers about how much content is really out there. To keep up with this volume of information, our systems have come a long way since the first set of web data Google processed to answer queries. Back then, we did everything in batches: one workstation could compute the PageRank graph on 26 million pages in a couple of hours, and that set of pages would be used as Google's index for a fixed period of time. Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day."
Re:Amazing (Score:5, Informative)
I couldn't agree more.
Many of the clients I support are constantly asking me "Is there a program that does this? or Can you find me a program to do this" etc etc.
I used to be able to just use google to help me get started but these days the top level searches are all those bloody link farms peddling "free" software, even when typing in the word review you come up with link farms that offer no reviews.
Comment removed (Score:5, Informative)
Re:Amazing (Score:5, Informative)
Many of the clients I support are constantly asking me "Is there a program that does this? or Can you find me a program to do this" etc etc.
I used to be able to just use google to help me get started but these days the top level searches are all those bloody link farms peddling "free" software
Have you tried SourceForge [sourceforge.net]? That's what it's there for, you know.
Re:How long till.. (Score:5, Informative)
... and cut out Experts-Exchange.com from your search results since their pages don't actually return the information you think they do.
Perhaps you should try scrolling to the bottom of the page... :)
Re:How long till.. (Score:5, Informative)
It took me a while to realize it, but if you scroll clear to the bottom of an expert exchange post, you'll find the comments unhidden and relevant.
Re:How long till.. (Score:5, Informative)
...and cut out Experts-Exchange.com from your search results since their pages don't actually return the information you think they do.
If you block cookies from experts-exchange.com you can actually see the answers on any e-e page - after you visit the first time, it normally sets a cookie to not show results next visit, which is how they get Google to index their pages anyway. With cookies from them blocked, you can then see the answers - you just have to scroll 7/8s of the way down the page past all the fake "Please sign up to see this result" boxes. :)
(First AC post in years... tee hee.
Re:Try "Live" search (Score:0, Informative)
The first result for 'getfirefox' is http://www.getfirefox.net/ [getfirefox.net] which seems to be correct to me.
The first result for 'Linux' is http://www.linux.org/ [linux.org] not some Microsoft website.
The first result for 'Open Office' is http://www.openoffice.org/ [openoffice.org] and not some Microsoft website.
You lie?
Re:Some numbers (Score:3, Informative)
You mean Googlewhacking [wikipedia.org], except not nearly as hard?
Re:How long till.. (Score:4, Informative)
Actually, If you go to the cached version of those pages, you can see all the answers. You can also just use the Googlebot's user agent via the User Agent Switcher [mozilla.org].
Re:No concern for the foreign readers? (Score:3, Informative)
No one in the UK uses the long scale system really. For example, traditional UK billions are _never_ used in governmental budgets, and no one points out that the "American" billion is being used. A billion is just 1E9 here, like just about everywhere else.
I guess some older people may be confused (what's new ;)), but I'll wager a large proportion of the younger UK population don't even know what a traditional English billion is. I'm 30, and I've never used 1E12 as a billion, or even been taught it could be.
Dynamic pages pollute count (Score:5, Informative)
Google tries to detect a dynamic page by looking for ampersands and equal signs, as well as looking at the content of the page, it is really quite easy to fool.
e.g.: http://somesite.com/itemlist.php?listmode=1&category=beds&orderby=7 [somesite.com]
when 'rewritten' shows up as
http://somesite.com/items/1/beds/7.html
So 1 billion web pages could be, and I know a few thousand pages like this, just a few hundred thousand dynamic pages. Not that the pages don't have relevant information, some of the stuff can be redundant though. For instance, when the spider crawls across "Records per page = 10" > "Records per page = 20" > "Records per page = 30" etc.. or when lazy programmers don't use cookies and databases to store information but try and concatenate the URL with the user's selections. Thank god for that GET limit [boutell.com]. People need to use POST!
If someone knows how to stop this message board from creating links out of false URLs please, let me know.