Google URL Index Hits 1 Trillion 249

Posted by Soulskill on Saturday July 26, 2008 @12:03AM from the orders-of-magnitude dept.

mytrip points out news that Google's index of unique URLs has reached a milestone: one trillion. Google's blog provides some more information, noting, "The first Google index in 1998 already had 26 million pages, and by 2000 the Google index reached the one billion mark. Over the last eight years, we've seen a lot of big numbers about how much content is really out there. To keep up with this volume of information, our systems have come a long way since the first set of web data Google processed to answer queries. Back then, we did everything in batches: one workstation could compute the PageRank graph on 26 million pages in a couple of hours, and that set of pages would be used as Google's index for a fixed period of time. Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day."

This discussion has been archived. No new comments can be posted.

Google URL Index Hits 1 Trillion

Load All Comments

Search 249 Comments Log In/Create an Account

Comments Filter:

Screenshot. (Score:5, Funny)

by Shaitan Apistos ( 1104613 ) writes: on Saturday July 26, 2008 @12:05AM (#24345505)

Or it didn't happen.

Share
twitter facebook
- Re: (Score:2)
  
  by halcyon1234 ( 834388 ) writes:
  
  An in a shocking victory for the trolls, the one trillionth page was a mirror of goatse.
- - Re:Odd (Score:5, Funny)
    
    by Anonymous Coward writes: on Saturday July 26, 2008 @12:29AM (#24345649)
    
    So unless there is a screenshot showing the 1,000,000,000,000 site count, Google's index didn't reach that milestone? Even if it now shows 1,000,000,000,001?
    The 1,000,000,000,000th page had only one word on it:
    "woosh"
    
    Parent Share
    twitter facebook
- - Re: (Score:3, Funny)
    
    by Anonymous Coward writes:
    
    Seriously, you just want to get into his/her pants...
    - Re:Screenshot. (Score:5, Funny)
      
      by Shaitan Apistos ( 1104613 ) writes: on Saturday July 26, 2008 @02:52AM (#24346195)
      
      That can be arranged.
      
      Parent Share
      twitter facebook
How long till.. (Score:5, Funny)

by loconet ( 415875 ) writes: on Saturday July 26, 2008 @12:09AM (#24345527) Homepage

Once the index reaches a google (or rather a googol), the universe explodes.

Share
twitter facebook
- Re:How long till.. (Score:5, Funny)
  
  by txoof ( 553270 ) writes: on Saturday July 26, 2008 @12:14AM (#24345551) Homepage
  
  Is that the modern equivalent of the Mayan calendar running out of days?
  
  Parent Share
  twitter facebook
- Re: (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  Won't happen since the universe's max integer is significantly smaller than a googol or a 'google'.
- Re:How long till.. (Score:5, Insightful)
  
  by rho ( 6063 ) writes: on Saturday July 26, 2008 @01:38AM (#24345917) Journal
  
  I'm more interested in when Google starts returning relevant results to my queries.
  I can't believe that I'm the only one that finds Google's quality of service somewhat below par. I guess they're better than randomly stabbing in the dark, and there certainly isn't any alternative that's obviously better, but Google sure isn't everything they think they are.
  I know--stop trying to compete with Wikipedia and cut out Experts-Exchange.com from your search results since their pages don't actually return the information you think they do.
  
  Parent Share
  twitter facebook
  - Re:How long till.. (Score:5, Informative)
    
    by onedotzero ( 926558 ) writes: on Saturday July 26, 2008 @02:05AM (#24346013) Homepage
    
    ... and cut out Experts-Exchange.com from your search results since their pages don't actually return the information you think they do.
    Perhaps you should try scrolling to the bottom of the page... :)
    
    Parent Share
    twitter facebook
  - Re:How long till.. (Score:5, Informative)
    
    by cdrudge ( 68377 ) writes: on Saturday July 26, 2008 @02:05AM (#24346015) Homepage
    
    It took me a while to realize it, but if you scroll clear to the bottom of an expert exchange post, you'll find the comments unhidden and relevant.
    
    Parent Share
    twitter facebook
    - - Re:How long till.. (Score:4, Informative)
        
        by Eddi3 ( 1046882 ) writes: on Saturday July 26, 2008 @04:33AM (#24346487) Homepage Journal
        
        Actually, If you go to the cached version of those pages, you can see all the answers. You can also just use the Googlebot's user agent via the User Agent Switcher [mozilla.org].
        
        Parent Share
        twitter facebook
  - Re:How long till.. (Score:5, Informative)
    
    by Anonymous Coward writes: on Saturday July 26, 2008 @02:07AM (#24346021)
    
    ...and cut out Experts-Exchange.com from your search results since their pages don't actually return the information you think they do.
    If you block cookies from experts-exchange.com you can actually see the answers on any e-e page - after you visit the first time, it normally sets a cookie to not show results next visit, which is how they get Google to index their pages anyway. With cookies from them blocked, you can then see the answers - you just have to scroll 7/8s of the way down the page past all the fake "Please sign up to see this result" boxes.
    (First AC post in years... tee hee. :)
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by fyoder ( 857358 ) * writes:
      
      If you block cookies from experts-exchange.com you can actually see the answers on any e-e page
      I don't know if Americans would be allowed to do that. Sounds like a potential violation of the DMCA.
  - Re:How long till.. (Score:5, Interesting)
    
    by blahplusplus ( 757119 ) writes: on Saturday July 26, 2008 @02:22AM (#24346097)
    
    "I'm more interested in when Google starts returning relevant results to my queries.
    I can't believe that I'm the only one that finds Google's quality of service somewhat below par."
    You're not the only one, but for the most part it is better then most other search engines out there. The real problem is spammers and paid advertising, I think spammers have really made search frustrating for a lot of companies. And ad companies pay other people to promote their sites for them (digg, slashdot, etc). I've noticed the increase in spam-vertised websites in search results for a lot of things.
    Personally I think the idea of sharding and search being more specific for what you're looking for is needed. I'd like to see a google with 'tags' and a delicious interface, things like educational institutions and universities get lumped into their own search engine space for instance, this would help narrow down what one is looking for, although it would take time and feedback to design something well for other areas. The fact is that search results get diluted as you put more and more stuff online (numbers and geometric scale).
    For fun, I've noticed stumble upon and del.ico.us are not bad alternatives when looking for new and interesting sites without having to use search
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by peragrin ( 659227 ) writes:
      
      you do know you can use google to search specific websites right? So if your looking through a specific domain you can add that to your search query.
      The other trick is to add search terms until you have gotten a smaller list of results.
    - Re: (Score:2, Interesting)
      
      by MPAB ( 1074440 ) writes:
      
      Try searching for a given product and the word "review" or something alike. You'll get endless pages of stores with no review whatsoever that must be scrolled away till you find a real technology site that has actually reviewed the product.
  - Re: (Score:2)
    
    by Mike89 ( 1006497 ) writes:
    
    If they cut out all the stupid "Find it cheapest here!!" style domains, I'd find it much more useful.
  - Comment removed (Score:4, Interesting)
    
    by account_deleted ( 4530225 ) writes: on Saturday July 26, 2008 @06:12AM (#24346801)
    
    Comment removed based on user account deletion
    
    Parent Share
    twitter facebook
  - Re: (Score:2, Interesting)
    
    by DiarmuidBourke ( 910868 ) writes:
    
    I'm more interested in when Google starts returning relevant results to my queries.
    I can't believe that I'm the only one that finds Google's quality of service somewhat below par. I guess they're better than randomly stabbing in the dark, and there certainly isn't any alternative that's obviously better, but Google sure isn't everything they think they are.
    I find this larger index rather unsettling as I feel my search results are becomming more unrelevant. Mostly due to the following reasoning.
    Finding 1 page in a billion page index is relatively easier than finding 1 page in a trillion page index.
    Has the relevance of the results kept up with the growing index size? or does the growing index cause more unrelevant pages to appear higher in the search results?
    In my opinion the next big advancement that is needed on the web is better auto-catagorisation of conten
  - - Re: (Score:2)
      
      by kv9 ( 697238 ) writes:
      
      Since we're on the subject and people @ google are reading this: please add a chekbox that excludes Wikipedia results.
      just add -wikipedia at the end of your query.
      - Re: (Score:2)
        
        by kv9 ( 697238 ) writes:
        
        Then you exclude all pages which include the word wikipedia (e.g. other sites which mention it and the likes). I want to exclude all the pages from wikipedia.
        -inurl:wikipedia ?
- Re: (Score:2, Funny)
  
  by bliip ( 1104227 ) writes:
  
  Slightly less catastrophic is the result of typing "google" into google: http://uk.youtube.com/watch?v=Fet0SCt7uGg [youtube.com]
Amazing (Score:3, Interesting)

by SoupIsGoodFood_42 ( 521389 ) writes: on Saturday July 26, 2008 @12:13AM (#24345541)

As someone who is partially engineering/analytically minded (but not a great programmer) it amazes me how Google has manged to index so much data, yet at the same time, serve up results in a fraction of a second to so many people.

Share
twitter facebook
- Re:Amazing (Score:5, Insightful)
  
  by timmarhy ( 659436 ) writes: on Saturday July 26, 2008 @12:20AM (#24345581)
  
  i wish they would work on weeding out the crap. anything you google now is infested with cheesy search sites that list other websites and try plaster you with ads. they contribute zero to the web.
  
  Parent Share
  twitter facebook
  - Re:Amazing (Score:5, Informative)
    
    by Freaky Spook ( 811861 ) writes: on Saturday July 26, 2008 @12:39AM (#24345691)
    
    I couldn't agree more.
    Many of the clients I support are constantly asking me "Is there a program that does this? or Can you find me a program to do this" etc etc.
    I used to be able to just use google to help me get started but these days the top level searches are all those bloody link farms peddling "free" software, even when typing in the word review you come up with link farms that offer no reviews.
    
    Parent Share
    twitter facebook
    - Re:Amazing (Score:5, Informative)
      
      by arotenbe ( 1203922 ) writes: on Saturday July 26, 2008 @01:27AM (#24345877) Journal
      
      Many of the clients I support are constantly asking me "Is there a program that does this? or Can you find me a program to do this" etc etc.
      I used to be able to just use google to help me get started but these days the top level searches are all those bloody link farms peddling "free" software
      Have you tried SourceForge [sourceforge.net]? That's what it's there for, you know.
      
      Parent Share
      twitter facebook
    - Re: (Score:3, Interesting)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
    - Re: (Score:3, Insightful)
      
      by cheater512 ( 783349 ) writes:
      
      If you design your queries well enough, then you dont see any of the crap.
  - Re: (Score:3, Interesting)
    
    by SoupIsGoodFood_42 ( 521389 ) writes:
    
    Yeah, that's a problem. I'm sure they'll work it out. I don't find it to be a problem most of the time though, just on certain searches in certain places. They have a real spam problem if you search for info on pharmaceuticals in their groups search last time I checked (about a month ago). The problem wasn't the Usenet groups, but their own special groups, and the worse thing is you can't filter out their groups and just search Usenet ones.
    I tried to contact them about it and discovered that they could also
    - - Re: (Score:2)
        
        by SoupIsGoodFood_42 ( 521389 ) writes:
        
        Groups search for diazepam [google.com] Not one legit result on the whole page, and none of them from traditional Usenet. Doing an advanced search and restricting to alt.drugs or some specific group does work better, but you need to know what groups are useful in the first place for that to work.
  - Try "Live" search (Score:3, Interesting)
    
    by symbolset ( 646467 ) * writes:
    
    And you'll be back faster than a Google search result. Weeding out the crap?
    Just for a sample, try this one: getfirefox [live.com]. If the first link on that search goes to a Mozilla mirror you will win one Internet. Try Linux [live.com]. Hey, this is fun. Spoiler: the first link there is always "www.Microsoft.com/Windows : Special Offers from Windows Vista® w/ the Purchase of Select Laptops." The first time I tried this I was looking for Open Office and wound up misdirected to a members only site where you had
    - Re:Try "Live" search (Score:5, Funny)
      
      by pagaboy ( 1029878 ) writes: on Saturday July 26, 2008 @04:12AM (#24346431)
      
      Turns out Live.com's market share for today has tripled due to Slashdot users clicking on the above links...
      
      Parent Share
      twitter facebook
    - Mod parent down - completely false (Score:2)
      
      by ricotest ( 807136 ) writes:
      
      All three of his examples go directly to the most official site for Firefox, Linux and OpenOffice respectively. Nice try though.
      That said, Google's results are still generally better.
    - Re: (Score:2)
      
      by RzUpAnmsCwrds ( 262647 ) writes:
      
      I don't know what you're trying to prove. For me, the first link for "getfirfox" is the MoFo/Google getfirefox.net. The first link for "Linux is linux.com. And the first link for "Open Office" is OpenOffice.org.
      Now if you want to argue that the "sponsored links" are crap, I can't help but agree. But Google has the exact same problem.
    - Re: (Score:2)
      
      by Taxman415a ( 863020 ) writes:
      
      Now I remember why I had never bothered to use their search. I expected tricks like that and there they are. It's funny to note they didn't buy/reserve that misleading "Linux" ad that points to Windows for the two word search term ubuntu linux but they did for the one word ubuntu which has more meanings.
      It's also funny to note how much more the interface looks like google's than anything MS has done before. It's so obvious they are admitting they don't know how to innovate anything themselves.
    - - Re: (Score:2)
        
        by symbolset ( 646467 ) writes:
        
        The "tell" for this lie is that people will try it. Did you think of that, Mr astroturf man? And what do you think they will find besides proof? You shouldn't submit this one for credit if you want to keep your job.
      - Re: (Score:2)
        
        by cammoblammo ( 774120 ) writes:
        
        You lie?
        Perhaps, perhaps not. I've experienced odd things like this inthe past.
        A few years ago I tried to get to the FSF site by typing the URL into the address bar in Firefox or Phoenix or whatever it was then. For some reason it took me to the Microsoft site.
        It turned out I'd mistyped the URL. In that situation the browser automatically does a Google 'I feel lucky' search on the term you type, apparently hoping to get to where you hoped to go.
        In my case, I mistyped the '://' after the protocol signifier. Conseque
    - - Re: (Score:2)
        
        by totally bogus dude ( 1040246 ) writes:
        
        It may be location-specific, but I don't get any links to Microsoft's site as the first results.
        The results for the UK seem reasonable, although the first one wasn't what you'd be expecting (e.g. the first result for "open office" was http://www.vnunet.com/vnunet/downloads/2128963/openoffice [vnunet.com]). I changed my browser language preference to English/Australia so I'd get more appropriate results (and deleted the live.com cookies), and I got fairly different results. The first few links for "linux" were to linux.o
        
        Re: (Score:2)
        
        by symbolset ( 646467 ) writes:
        
        The ads are pretty clearly marked as being ads (sorry, "sponsored links") so the GP seems to be trying to grind an axe. However, that kind of advertising is usually reserved for 2-bit companies with no real products trying to dupe naive users, so it's a bit surprising to see a company like Microsoft resorting to such cheap tactics.
        I don't know why so many people think that I missed that they're ads. I mentioned it twice. Is all of slashdot caffeine deficient this morning?
        -- "ad placements and query resu
    - - Cluebat (Score:2)
        
        by symbolset ( 646467 ) writes:
        
        Now go to google and type "live search", "Microsoft" and "Microsoft Office" (without the quotes). If you can't explain why the ads and search results that Google puts on those pages are qualitatively better then you're not qualified to judge my comment. On a lark I went back to Live and did those searches too. If you search for "Microsoft" on Live today it shows three ads, each of which is likely to be more harm than help. This is why Microsoft is third in search and fading despite wasting billions on it
    - - Re: (Score:2)
        
        by symbolset ( 646467 ) writes:
        
        I know this is slashdot, but if you carefully reread the post you responded to you'll find the last line mentions that these maliciously deceptive links are ads. I'm not trying to mislead anybody. Are you?
  - Re: (Score:3, Insightful)
    
    by Anonymous Coward writes:
    
    i wish they would work on weeding out the crap.
    There are a *lot* of people at Google working on that problem. Please understand that it is really difficult to keep up with new attacks when your site is #1, because many people out there are aiming directly for it. No matter how many work on ranking and relevance inside the company, there will always be 10x-100x that number of people outside who are working on the shady side of SEO, spamming, etc. It's a never-ending battle, much like spam email. We're trying.
    anything you google now is infested with cheesy search sites that list other websites and try plaster you with ads. they contribute zero to the web.
    We're working on that both from the search
- Re: (Score:2)
  
  by ILuvRamen ( 1026668 ) writes:
  
  They rub the servers with cheetah blood. Anyway, the real magic is how is their index that big when there's not that many websites? I thought I just saw an estimate like a half a year ago saying that there's about 4 billion or 10 billion or whatever domain names registered and they estimate xxx number of total web pages and it wasn't even close to 1 trillion. Did all those stupid web 2.0 pages screw with it that much?
  - Re: (Score:2)
    
    by SoupIsGoodFood_42 ( 521389 ) writes:
    
    I imagine that certain sites, such as sites the size of Slashdot (in terms of dynamically generated pages), make a difference. After all, the index talks in pages, not domains. I bet there's also a lot of junk and redundancy in there, but still, it's quite an achievement to be able to deal with that much data.
    - Re:Amazing (Score:4, Funny)
      
      by cammoblammo ( 774120 ) writes: <cammoblammo@Nospam.gmail.com> on Saturday July 26, 2008 @02:38AM (#24346147)
      
      I imagine that certain sites, such as sites the size of Slashdot (in terms of dynamically generated pages), make a difference. After all, the index talks in pages, not domains. I bet there's also a lot of junk and redundancy in there, but still, it's quite an achievement to be able to deal with that much data.
      Surely you're not saying that Slashdot's full of junk and redundancy and redundancy?
      
      Parent Share
      twitter facebook
- Re: (Score:2)
  
  by KillerCow ( 213458 ) writes:
  
  As someone who is partially engineering/analytically minded (but not a great programmer) it amazes me how Google has manged to index so much data, yet at the same time, serve up results in a fraction of a second to so many people.
  See "map reduce"
  - Re: (Score:2)
    
    by tuomoks ( 246421 ) writes:
    
    Thanks - I was going to reply but you got there first. Map reduce and big table, etc - old technologies found again. A must when you have to match arbitrary queries - SQL (actually relational databases but in many minds same?) just doesn't do it, something what was know, forgotten and slowly acknowledged again.
Wow, that's a lot of porn. (Score:5, Funny)

by Anonymous Coward writes: on Saturday July 26, 2008 @12:17AM (#24345559)

Seriously, since the web is something like 42% porn. (Yes, that is the ultimate answer.) So that's on average, 60-70 pages of each person in the world naked.

Share
twitter facebook
- Re:Wow, that's a lot of porn. (Score:5, Interesting)
  
  by sweet_petunias_full_ ( 1091547 ) writes: on Saturday July 26, 2008 @01:46AM (#24345959)
  
  "the web is something like 42% porn"
  That probably stopped being the case after namespace speculators started buying up expired domains in large numbers just to put up a mildly useless index on *each* and *every* site to collect ad revenue or marketing statistics off of unwary visitors. I would also include typosquatters in that category, and maybe someone else can name a few other examples of utter namespace hogging uselessness.
  Whatever it is, you can rest assured that it's mostly repetitive trash... no need to stand in awe of it.
  
  Parent Share
  twitter facebook
1 trillion url's (Score:5, Funny)

by jollyreaper ( 513215 ) writes: on Saturday July 26, 2008 @12:17AM (#24345561)

How many of those are automatically generated rank-spoofers, 80%?
My favorite spoof pages were the ones that randomly substituted search terms into porno stories.
"Yes!" she screamed as he thrust his SAMSUNG CD PLAYER deep into her. "I want you balls-deep in my CHEAP HARD DRIVES!" The smell of DISCOUNT SOFTWARE filled the room.

Share
twitter facebook
No concern for the foreign readers? (Score:2, Insightful)

by Anonymous Coward writes:

Trillion can mean 1E+12 or 1E+18 depending on which country you are in.
- Re:No concern for the foreign readers? (Score:5, Funny)
  
  by kclittle ( 625128 ) writes: on Saturday July 26, 2008 @12:47AM (#24345735)
  
  Google is headquartered in Mountain View, CA -- I know, 'cause I googled it. Now, California is rather inclined to think of itself as it own country (some would say, universe), but it is indeed part of the United States of America (again, I checked with Google). And in the US, "trillion" == 1E12 (again, Google).
  
  Parent Share
  twitter facebook
- - Re: (Score:3, Informative)
    
    by Smauler ( 915644 ) writes:
    
    No one in the UK uses the long scale system really. For example, traditional UK billions are _never_ used in governmental budgets, and no one points out that the "American" billion is being used. A billion is just 1E9 here, like just about everywhere else.
    I guess some older people may be confused (what's new ;)), but I'll wager a large proportion of the younger UK population don't even know what a traditional English billion is. I'm 30, and I've never used 1E12 as a billion, or even been taught it could
And I rest in peace.. (Score:3, Funny)

by consonant ( 896763 ) writes: <shrikant...n@@@gmail...com> on Saturday July 26, 2008 @12:23AM (#24345613) Homepage

..knowing that the vast amounts of porn just keep getting vaster. And more searchable. Amen. *sheds a tear or two*

Share
twitter facebook
Some numbers (Score:5, Interesting)

by Reality Master 101 ( 179095 ) writes: <RealityMaster101 AT gmail DOT com> on Saturday July 26, 2008 @12:30AM (#24345661) Homepage Journal

Counts of words:
the: 18.3 billion pages
a: 23.9B
0: 12.7B
1: 25.4B
in: 17.1B
I: 10.2B
I know these numbers aren't exact, but you'd think one of them would be over 100B if Google is really indexing a trillion pages. What's on them? Anyone find any keywords that produce more?

Share
twitter facebook
- Re:Some numbers (Score:4, Funny)
  
  by Shaitan Apistos ( 1104613 ) writes: on Saturday July 26, 2008 @12:44AM (#24345721)
  
  My hobby:
  Getting the fewest possible google results above 0 with a quoted string.
  "interspecies gangbang": 6
  "hot topic meets disney world": 2
  "died in a blogging accident": 15,300
  "can boys make babies": 4
  "why does it hurt when I read": 1
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Reality Master 101 ( 179095 ) writes:
    
    Hmm. This probably means something statistically, but I'm not sure what... I started adding digits to a number until I hit one result. I had to get to nine digits:
    123512553
    215323703
    684354537
    I also found a few with 0 to 3 results. Interestingly, I couldn't find any eight digit numbers that scored zero hits.
    - Re: (Score:2)
      
      by txoof ( 553270 ) writes:
      
      I couldn't find any eight digit numbers that scored zero hits.
      That's really interesting. Perhaps 8 digit numbers are common for serial numbers and dates. Today's date is 20080723. I can't even begin to think of all the logs and file names that I've generated that contain a similar string.
      Anyone else have any ideas why 8 digit numbers are so common?
      - Re: (Score:3, Interesting)
        
        by SnowZero ( 92219 ) writes:
        
        You can list all of them with less than a gigabyte: 10^8 * (8+1) ~= 858 MB
        The web is pretty big, so all of them are bound to happen *somewhere*.
        Plus, I just registered all8digits.net
  - Re: (Score:2)
    
    by chillax137 ( 612431 ) writes:
    
    The only way to save your accomplishments is to not post them on the internet
    - Re: (Score:2)
      
      by arotenbe ( 1203922 ) writes:
      
      Indeed. Google already has two hits for the phrase "why does it hurt when I read": the actual (and rather disturbing, but amusing) page the quote was found on, and this page. Scary, isn't it?
  - Re: (Score:3, Insightful)
    
    by miraboo ( 1164359 ) writes:
    
    My hobby:
    Getting the fewest possible google results above 0 with a quoted string.
    "interspecies gangbang": 6
    "hot topic meets disney world": 2
    "died in a blogging accident": 15,300
    "can boys make babies": 4
    "why does it hurt when I read": 1
    My Hobby
    
    Attributing my sources: http://xkcd.com/369/ [xkcd.com]
    - Re:Some numbers (Score:5, Interesting)
      
      by Shaitan Apistos ( 1104613 ) writes: on Saturday July 26, 2008 @02:40AM (#24346151)
      
      My Hobby
      Attributing my sources: http://xkcd.com/369/ [xkcd.com]
      In [xkcd.com] , my [xkcd.com] humble [xkcd.com] opinion [xkcd.com] my [xkcd.com] usage [xkcd.com] of [xkcd.com] "My [xkcd.com] Hobby" [xkcd.com] was [xkcd.com] sufficient [xkcd.com] attribution [xkcd.com], all [xkcd.com] by [xkcd.com] itself. [xkcd.com]
      
      Parent Share
      twitter facebook
    - Re: (Score:2)
      
      by MagdJTK ( 1275470 ) writes:
      
      pfft. So they both type strings into google? WOW HE SO MUST HAVE STOLEN IT!!!!
      By that logic, xkcd stole it from Dave Gorman [wikipedia.org].
  - Googlewhacking (Score:2)
    
    by symbolset ( 646467 ) writes:
    
    It's harder to do now. If you can find two words unquoted that result in one result it's a googlewhack [googlewhack.com] like this one was [google.com] before Google found "eltiguan parainaugurarme" on this page and made it the second result.
    BTW, my first google search was "war" and it returned something equivalent to "Your search term is too common to return a meanignful result. Narrow your search." Today it returns about 974 million results. It was long ago...
  - Re: (Score:3, Informative)
    
    by Ihmhi ( 1206036 ) writes:
    
    You mean Googlewhacking [wikipedia.org], except not nearly as hard?
  - Re: (Score:2)
    
    by spacefrog ( 313816 ) writes:
    
    Much like Fight Club, the number one rule of your particular hobby is not to talk about your results hence you taint them.
    Kind of like counterfeiting currency or collecting teenybopper porn, it sucks to have a cool hobby you can't brag about.
    - Re: (Score:3)
      
      by kv9 ( 697238 ) writes:
      
      Kind of like counterfeiting currency or collecting teenybopper porn, it sucks to have a cool hobby you can't brag about.
      I brag about that all the time.
- Re: (Score:2)
  
  by antifoidulus ( 807088 ) writes:
  
  Or maybe a lot of those pages are in non-Latin script. According to the Chinese government there are more Chinese online than Americans, and they don't use the latin alphabet....
  - Re: (Score:2)
    
    by Reality Master 101 ( 179095 ) writes:
    
    I thought maybe different languages might explain things, but I'd say English is by far the most popular languages. I might even buy English is not the majority of pages anymore (though, I'm skeptical), but only 1 in 25 pages is English?
- Re: (Score:2)
  
  by WK2 ( 1072560 ) writes:
  
  I tried the following: the OR a OR 0 OR 1 OR in OR i
  It says "Results 1 - 10 of about 0". Must be a wrap-around bug.
What's going on with the founders' studies? (Score:5, Interesting)

by bogaboga ( 793279 ) writes: on Saturday July 26, 2008 @12:31AM (#24345663)

This might be off-topic but I wonder what's going on with Sergey Brin and Larry Page's [PhD] education? Just wondering...did they give up?

Share
twitter facebook
- Re: (Score:2)
  
  by Jeff DeMaagd ( 2015 ) writes:
  
  If I turned billionaire before I finished schooling, I don't think I would finish unless I got bored enough to do so. I don't think it's necessary for them to finish either, because they can hire plenty of PhDs with the knowledge that they would have gotten if they finished.
  Running a business doesn't require that you know everything needed to run the business, only that you know how to get people with the skills that you need.
- - - Re: (Score:2)
      
      by James Youngman ( 3732 ) writes:
      
      I try never to comment on Google stories but this comment shines forth as being so glaringly dismissive that I can't hold back. If you think that going from 1e9 to 1e12 known URLs is not "an impressive conceptual or technical feat" you should review some course materials on asymptotic complexity and think about how you would index the entire web using only algorithms that don't simply become infeasible at 1e12. In fact, forget about the indexing complexity per se, have a think about how you would select w
    - Re: (Score:2)
      
      by cheater512 ( 783349 ) writes:
      
      I really cannot see how a PhD helps you to innovate.
- - Motivitation for the founders' studies? (Score:2)
    
    by AlpineR ( 32307 ) writes:
    
    Sometimes it's nice to be called Dr. Brin or Dr. Page. Especially when dealing with somebody who doesn't know you or talking to a roomful of PhD's (such as their employees).
Comment removed (Score:5, Informative)

by account_deleted ( 4530225 ) writes: on Saturday July 26, 2008 @01:06AM (#24345809)

Comment removed based on user account deletion

Share
twitter facebook
- Re: (Score:2)
  
  by Johnny Mnemonic ( 176043 ) writes:
  
  They have indexed 40 billion pages. Read the entire Google post. It says it right there.
  
  I didn't see that information in either post, only that their index is somewhat less than the full 1T unique URLs. The full size of the Google index is probably confidential, proprietary info. Their competitors could use it as a benchmark for assessing how their service compares, so I'm not surprised that it's not there--or if it was there that it was removed.
And most of them are webspam (Score:4, Insightful)

by Animats ( 122034 ) writes: on Saturday July 26, 2008 @01:25AM (#24345867) Homepage

But how many of those trillion pages have unique, useful content? E-mail is over 95% spam, and the web is getting there.
There were about 153 million registered domains at the beginning of the year. The ones from the spam-friendly registrars [knujon.com] are mostly junk. Tim Bernars-Lee said in 2006 that web junk was becoming a major problem, and it's become worse since then.
If you throw out all the anonymous but commercial domains (we call them "bottom-feeders"), as we do with SiteTruth [sitetruth.com], the Web looks a lot better. Search engines are getting stricter about this. You don't see that many "landing pages" in Google any more. Bad news [fool.com] for companies like Marchex [yahoo.com], the publicly traded web spammer that cranks out all those junk "What you need, when you need it" sites.
"The mass trials are going well. There will be fewer Russians, but better ones." - Greta Garbo in Ninotchka.

Share
twitter facebook
A trillion URLs... (Score:2)

by Chester K ( 145560 ) writes:

A trillion URLs, and still no sign of clownpenis.fart in the index anywhere!
At this rate it really will be the last one to go.
- Re: (Score:2)
  
  by repvik ( 96666 ) writes:
  
  Huh? I just *had* to google that one...
  Results 1 - 10 of about 8,580 for clownpenis.fart
What I want to know is in Google somewhere... (Score:2)

by symbolset ( 646467 ) writes:

Specifically which page was the trillionth?
- Re: (Score:2)
  
  by cheater512 ( 783349 ) writes:
  
  More than likely a spammer's page. :)
google's search becoming steadily useless. (Score:4, Insightful)

by blind biker ( 1066130 ) writes: on Saturday July 26, 2008 @03:57AM (#24346391) Journal

I think google.com's search engine achieved its peak usefuleness about 5 years ago. Now, for the most part when I google for a certain electronic component I get some crappy webstore front (and by crappy I mean I can't actually order the component but must "contact by phone" first) or if I search for an electronic device, be it pro or just home electronics, I get those "Read reviews and compare prices"-sites. Which I hate with a passion. WTF google, you have the world's most talented programmers, can't you weed out this crap from your search? At least so it doesn't come up as top hits?

Share
twitter facebook
- Re: (Score:2)
  
  by PatrickThomson ( 712694 ) writes:
  
  It's almost like google stopped bothering about returning relevant commercial sites in the main search, and farmed it off to some subsidy. I mean, talk about saving money, that's a very froogle thing to do.
  Please then allow me to bludgeon you with the point, which is google's most excellent product search at www.froogle.com . If it's buyable on the internet, it's there.
  - Re: (Score:3, Interesting)
    
    by blind biker ( 1066130 ) writes:
    
    Thanks, but what I was trying to say (and I'll admit to bad wording), is that not only does google.com search return webstore fronts when I am actually looking for technical information about electronic components (this is the point I did not get across well - I am not looking for shops, but for info), but it returns the worst kind of webshops. The kind that isn't really a webshop at all, as in, you can't actually buy anything from them using the web.
    As for froogle: I just tried searching for "NAD 701" (wit
    - Re: (Score:2)
      
      by PatrickThomson ( 712694 ) writes:
      
      Ah yes, technical information about something that gives you 5 pages of crap, even with "technical manual" specified in the search, is a problem. quite right. Of course, if the manufacturer hasn't released it in the first place, it might just not be on the internet.
But how useful is it all? (Score:2)

by Colin Smith ( 2679 ) writes:

I mean, really. 90% of it is junk.
Dynamic pages pollute count (Score:5, Informative)

by Coolhand2120 ( 1001761 ) writes: on Saturday July 26, 2008 @05:52AM (#24346759)

There are so many dynamic pages on the net now that one web site, like slashdot as an earlier poster commented, can contain literally millions of pages. People use programs like modrewrite [apache.org], isapirewrite [isapirewrite.com] and linkfreeze [helicontech.com] to manipulate spiders into crawling pages that are near identical. For more than one customer I've made meta, title and content randomization, serialization and or URL rewriting schemes to make damn sure spiders index every possible dynamic page, and it works. I have a single dynamic page that must have been indexed hundreds, maybe thousands of times with slightly different content, and they are all in the index.

Google tries to detect a dynamic page by looking for ampersands and equal signs, as well as looking at the content of the page, it is really quite easy to fool.

e.g.: http://somesite.com/itemlist.php?listmode=1&category=beds&orderby=7 [somesite.com]
when 'rewritten' shows up as
http://somesite.com/items/1/beds/7.html

So 1 billion web pages could be, and I know a few thousand pages like this, just a few hundred thousand dynamic pages. Not that the pages don't have relevant information, some of the stuff can be redundant though. For instance, when the spider crawls across "Records per page = 10" > "Records per page = 20" > "Records per page = 30" etc.. or when lazy programmers don't use cookies and databases to store information but try and concatenate the URL with the user's selections. Thank god for that GET limit [boutell.com]. People need to use POST!

If someone knows how to stop this message board from creating links out of false URLs please, let me know.

Share
twitter facebook
- Re: (Score:2)
  
  by cheater512 ( 783349 ) writes:
  
  Google seems to filter duplicates rather well.
  I doubt its a problem.
Who cares? (Score:2)

by nfk ( 570056 ) writes:

"Why is McDonald's still counting? How insecure is this company? Forty million eighty jillion killion trillion....is anyone really impressed anymore? Oh eighty-nine billion sold! All right I'll have one."
-- Jerry Seinfeld
BigInt (Score:2)

by revlayle ( 964221 ) writes:

google finally has a use for a bigint auto-increment primary key now!!!!
- Re:First Post (Score:4, Interesting)
  
  by Vectronic ( 1221470 ) writes: on Saturday July 26, 2008 @01:54AM (#24345983)
  
  -1 Redundant sure...
  But that's sort of along the lines I was/am thinking... take txoof's post alone (or mine, or whoever may reply) there are 3 separate URLS for each Slashdot comment
  The Header:
  http://search.slashdot.org/comments.pl?sid=626647&cid=24345519 [slashdot.org]
  The User:
  http://slashdot.org/~txoof [slashdot.org]
  The Score:
  http://search.slashdot.org/article.pl?sid=08/07/26/0036245# [slashdot.org]
  How many Slashdot comments are there? It's probably in the high millions, (rhetorical, but I'm interested to know none-the-less) There's like an average of about 250 comments per article, about 25 articles a day, thats about 2 million a year, so 6 million links, then take into consideration stuff like Facebook, which bounces URLs (http://www.facebook.com/link=###/etc) or sites that generate a random identifier every few minutes, making those "unique", gets unexciting quite quickly, Although billions is still fairly high.
  
  Parent Share
  twitter facebook
  - Re:First Post (Score:4, Funny)
    
    by repvik ( 96666 ) writes: on Saturday July 26, 2008 @03:25AM (#24346281)
    
    Considering your comment is #24345983, I'd say about 24.3 million comments. Also, I believe there's about 1.5 million different users.
    
    Parent Share
    twitter facebook
    - Re:First Post (Score:5, Funny)
      
      by Anonymous Coward writes: on Saturday July 26, 2008 @05:32AM (#24346683)
      
      Also, I believe there's about 1.5 million different users.
      yeah but if you take out Twitter and all his sock-puppets you'll just be left with 500K unique users...
      
      Parent Share
      twitter facebook
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re:Bandwidth... too much (Score:2, Interesting)
  
  by Doug52392 ( 1094585 ) writes:
  
  On my home Web server, I accidentally left a copy of the PHP manual in a browsable folder, which was linked to the homepage. So when Google indexed my homepage, guess what it also checked for? Every single page the homepage linked to! Including that manual... and damn the PHP manual has a LOT of pages.
  So when I got back on the server and pulled up the logs (it was running strangely slow) I found Googlebot accessing page after page after page of the PHP manual. Thousands of pages. Lagging the server and Inte
- Try "Live" search (Score:2)
  
  by symbolset ( 646467 ) writes:
  
  You'll be back faster than a Google search result.
- Re: (Score:2)
  
  by lintux ( 125434 ) writes:
  
  Possibly that page really *was* useful, but the domain name expired. Lots of spammers on the nets like to snatch expired domain names and "monetize" (sic) them.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Screenshot. (Score:5, Funny)

Re: (Score:2)

Re:Odd (Score:5, Funny)

Re: (Score:3, Funny)

Re:Screenshot. (Score:5, Funny)

How long till.. (Score:5, Funny)

Re:How long till.. (Score:5, Funny)

Re: (Score:2, Insightful)

Re:How long till.. (Score:5, Insightful)

Re:How long till.. (Score:5, Informative)

Re:How long till.. (Score:5, Informative)

Re:How long till.. (Score:4, Informative)

Re:How long till.. (Score:5, Informative)

Re: (Score:2)

Re:How long till.. (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2, Interesting)

Re: (Score:2)

Comment removed (Score:4, Interesting)

Re: (Score:2, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Funny)

Amazing (Score:3, Interesting)

Re:Amazing (Score:5, Insightful)

Re:Amazing (Score:5, Informative)

Re:Amazing (Score:5, Informative)

Re: (Score:3, Interesting)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

Re: (Score:2)

Try "Live" search (Score:3, Interesting)

Re:Try "Live" search (Score:5, Funny)

Mod parent down - completely false (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Cluebat (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:Amazing (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Wow, that's a lot of porn. (Score:5, Funny)

Re:Wow, that's a lot of porn. (Score:5, Interesting)

1 trillion url's (Score:5, Funny)

No concern for the foreign readers? (Score:2, Insightful)

Re:No concern for the foreign readers? (Score:5, Funny)

Re: (Score:3, Informative)

And I rest in peace.. (Score:3, Funny)

Some numbers (Score:5, Interesting)

Re:Some numbers (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:Some numbers (Score:5, Interesting)

Re: (Score:2)

Googlewhacking (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

What's going on with the founders' studies? (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Motivitation for the founders' studies? (Score:2)

Comment removed (Score:5, Informative)

Re: (Score:2)

And most of them are webspam (Score:4, Insightful)