Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

300 Years to Index the World's Information 248

Posted by Zonk on Sunday October 09, 2005 @04:27PM from the everything-else-to-be-destroyed dept.

Kasracer writes "At the Association of National Advertisers annual conference, Google's CEO, Eric Schmidt suggested that it would take 300 years for them to index all of the world's information. From the article: 'We did a math exercise and the answer was 300 years,' Schmidt said in response to an audience question asking for a projection of how long the company's mission will take. 'The answer is it's going to be a very long time.'"

This discussion has been archived. No new comments can be posted.

300 Years to Index the World's Information

Load All Comments

Search 248 Comments Log In/Create an Account

Comments Filter:

Longer than expected (Score:5, Funny)

by powerpuffgirls ( 758362 ) writes: on Sunday October 09, 2005 @04:28PM (#13752488)

I always thought 42 years ought to be enough.

Share
twitter facebook
- Re:Longer than expected (Score:3, Funny)
  
  by Anonymous Coward writes:
  
  64 years should be enough for anyone.
- Re:Longer than expected (Score:3, Interesting)
  
  by Almost-Retired ( 637760 ) writes:
  
  42 years, from Douglas Adams HHGTTG? Yes, I expect it will be enough since storage and computer power growth will foreshorten his estimated 300 years. But one possible constraint might exist, that of finding the energy to power all that, and to cool it. But who knows what we'll be using to add 2&2 15 years from now, I don't & won't because I'll probably be returning to dust by then, although some of the whatif press sure seems positive.
  
  On a side note, since they are restricted to doing verbatum t
- - Re:Longer than expected (Score:2)
    
    by the real darkskye ( 723822 ) writes:
    
    No, I'd say it'd be more like... seven... and a half
    
    What? Not until next week?
New hardware needed (Score:5, Funny)

by nizo ( 81281 ) * writes: on Sunday October 09, 2005 @04:33PM (#13752521) Homepage Journal

The hardest part will be developing the hardware that is able to recursively index the Google data itself an infinite number of times.

Share
twitter facebook
- There are a lot of areas where essentially this... (Score:2, Interesting)
  
  by hackwrench ( 573697 ) writes:
  
  question is asked, and they seem to miss that the answer is that it is it's own index.
- - Re:New hardware needed (Score:5, Funny)
    
    by spuzzzzzzz ( 807185 ) writes: on Sunday October 09, 2005 @06:11PM (#13753066) Homepage
    
    It's OK, they use linux. It does infinite loops in 5 seconds [shsu.edu].
    
    Parent Share
    twitter facebook
What About... (Score:5, Insightful)

by Adrilla ( 830520 ) * writes: on Sunday October 09, 2005 @04:33PM (#13752524) Homepage

Did they take into account the information that is being created as they are indexing? Do they plan on live indexing everything that's being made. Information doesn't stop getting created just because they've stored everything that's already been done.

Share
twitter facebook
- Re:What About... (Score:3, Insightful)
  
  by antdude ( 79039 ) writes:
  
  And information that are deleting.
- Re:What About... (Score:5, Interesting)
  
  by htrp ( 894193 ) writes: on Sunday October 09, 2005 @04:41PM (#13752582)
  
  I would assume that it would be to index the collective sum of information, even as it is growing. It's probably a lot quicker to index something than it is to generate it. With probable future advances in computing power and the development of new algorithms, it should be entirely possible that the speed of indexing (which already probably surpasses the speed of information production) would catch up to all the data that still hasn't been indexed.
  
  Think of it in terms of taking a ratio comparison of two infinite series.
  
  Parent Share
  twitter facebook
  - Besides... (Score:2)
    
    by Belial6 ( 794905 ) writes:
    
    With the current state of "IP" law, it will be illegal to index any new information anyways. So, once the corporation that owns it decides to delete it, it won't count.
- Re:What About... (Score:2, Insightful)
  
  by barum87 ( 914698 ) writes:
  
  Most of the information we are creating now is already electronic; therefore it's a lot easier and less time consuming.
- Re:What About... (Score:3, Interesting)
  
  by Max Nugget ( 581772 ) writes:
  
  Did they take into account the information that is being created as they are indexing? Do they plan on live indexing everything that's being made. Information doesn't stop getting created just because they've stored everything that's already been done.
  
  Funny you mention that. In some versions of Superman, Brainiac, a living computer whose mission is to gather all information about every planet in the universe, entered into the world of villainry because he logically reasoned that the only way he could eve
But... (Score:2)

by Bobzibub ( 20561 ) writes:

..only 150 years to do it next year.
- Not the Moore model but the Bono model (Score:3, Interesting)
  
  by tepples ( 727027 ) writes:
  
  No, the proper model is not Moore's law but Bono's law [pineight.com]. If it takes 300 years now, then it'll take 320 years in 20 years, and most of the time will be spent waiting for exclusive rights to expire (if they ever do). For instance, indexing a literary work that's out of print and not widely available at libraries requires getting a new copy, and those aren't available until the copyright runs out.
- - Re:But... (Score:3, Insightful)
    
    by BewireNomali ( 618969 ) writes:
    
    i read the article, and this is what I got from it. i could be wrong.
    
    -5 million TB of data.
    -170 TB have already been indexed.
    -it would take 300 years to index that data and make it searchable.
    
    I don't think it's an exercise to index all knowledge. As you point out, that would be alogical. I think it's more of an understanding of what it would take to effectively and completely serve the world's information needs given current indexing capabilities.
    
    I guess establishing a benchmark currently, both of how effic
300 years? (Score:5, Funny)

by RonnyJ ( 651856 ) writes: on Sunday October 09, 2005 @04:35PM (#13752539)

300 years? I'd have thought their other [theonion.com] plan would have been a lot quicker.

Share
twitter facebook
I'd like my house indexed (Score:5, Funny)

by obli ( 650741 ) writes: on Sunday October 09, 2005 @04:36PM (#13752548)

How long until Google decides that your house is information? Just imagine an army of small robot spiders invading your home every night, registering the position, name and contents of every single object you own, making it searchable from house.google.com. Unless you nail a robots.txt to your front door, that is...

Share
twitter facebook
- Maybe you're thinking about this? (Score:2)
  
  by Schwarzchild ( 225794 ) writes:
  
  Robot Exclusion Protocol [ftrain.com] courtesy of ftrain.
- - Re:I'd like my house indexed (Score:5, Funny)
    
    by jacksonj04 ( 800021 ) writes: <nick@nickjackson.me> on Sunday October 09, 2005 @05:30PM (#13752850) Homepage
    
    locate:keys | pocket
    locate:phone | pocket
    locate:underwear -girlfriend | rm
    
    Parent Share
    twitter facebook
    - Re:I'd like my house indexed (Score:3, Funny)
      
      by WilliamSChips ( 793741 ) writes:
      
      locate: girlfriend: no such thing for a slashdotter
Oooo.... No it's the giant brains all over again (Score:2)

by technoextreme ( 885694 ) writes:

Let's just hope google doesn't destroy the universe when it's done collecting all of it's information like the giant brains.
- Re:Oooo.... No it's the giant brains all over agai (Score:3, Funny)
  
  by jrockway ( 229604 ) * writes:
  
  Don't worry, I just ordered a pizza for I. C. Wiener.
Everybody! (Score:5, Funny)

by Slashdiddly ( 917720 ) writes: on Sunday October 09, 2005 @04:42PM (#13752584)

Please stop creating new information and let Google catch up! You can resume later.

Share
twitter facebook
- Re:Everybody! (Score:2)
  
  by ZakuSage ( 874456 ) writes:
  
  On that note, Google has announced that once it has indexed all the world's information, it will destroy the world to prevent any new information from arising to compramize it's massive index.
Yeah right.. (Score:5, Funny)

by Klowner ( 145731 ) writes: on Sunday October 09, 2005 @04:42PM (#13752588) Homepage

It's going to take them a hell of a lot longer than that, considering my car keys are always moving.

Share
twitter facebook
When I read the summary (Score:5, Funny)

by colonslashslash ( 762464 ) writes: on Sunday October 09, 2005 @04:42PM (#13752589) Homepage

I immediately thought of the Futurama episode - The Why of Fry - where the giant brains build the brainsphere and assimilate all the knowledge in existance, before attempting to destroy the entire universe so no new information can be added.
Googlesphere anyone?

Share
twitter facebook
On a related note... (Score:5, Interesting)

by RyanFenton ( 230700 ) writes: on Sunday October 09, 2005 @04:42PM (#13752590)

I wonder how many man-years it would take to listen to all the music and video that could be indexed. Be interesting at least to find out what the order of magnitute would be - millions, or perhaps billions or trillions of man-years of unique recorded audio and video? It would have to be a game of gross estimation - but it would at least put into perspective how much material is out there, even if most of it is boring "security" footage, compared to the scope of our lives.

It'd be interesting, if, perhaps in a couple generations, we could have a cheap media volume that contained "recorded media, prehistory - to - 2050ad"... if the media that exists today even survives a couple generations, and copyrights aren't extended indefinetly. The idea of an indexing system that can even put all that information into a meaningful context would be fascinating to consider though, if it could be possible.

Ryan Fenton

Share
twitter facebook
- Re:On a related note... (Score:2)
  
  by kbahey ( 102895 ) writes:
  
  a cheap media volume that contained "recorded media, prehistory - to - 2050ad
  
  This is indeed possible. If I think back 20 years, and how much information was digitized vs. today, it is just plain staggering.
  Think about Britannica, Project Gutenberg, CCEL, Perseus, Alwarraq, and many thousands of other sites that offer reference works in digital form. Think about the commerical CD-ROMs that have that.
  All that in just 20 years.
  So, 100 more years, we can have all the heiroglyphs off of Egyptian temple
- Re:On a related note... (Score:2)
  
  by Mad_Rain ( 674268 ) writes:
  
  I wonder how many man-years it would take to listen to all the music and video that could be indexed.
  
  Probably a lot longer than you think.
  
  "Goddammit, I can't watch or listen to any more of this crap! I'm going out for a cup of coffee and a smoke."
Competition? (Score:4, Interesting)

by psst ( 777711 ) writes: on Sunday October 09, 2005 @04:44PM (#13752600) Homepage

From the article:
Of the approximately 5 million terabytes of information out in the world, only about 170 terabytes have been indexed, he said earlier during his speech.
Storing 5 million terabytes has got to cost a lot of resources. It would be very inefficent if every competing search engine stored that much data. Makes me wonder if it would make more sense to nationalize Google's index and share it amongst competitors (just like it makes more sense for goverments to build airports and share them amongst airlines rather than every airline building its own airports).

Share
twitter facebook
- Re:Competition? (Score:5, Insightful)
  
  by Shihar ( 153932 ) writes: on Sunday October 09, 2005 @05:29PM (#13752847)
  
  Nationalize Google? Are you joking me or just insane? You want to take one of the most innovative and successful companies that the US has right now a nationalize it!?
  
  I have a better idea, how about you just send out a government hit squad to kill to put a bullet between the eyes of single entrepreneur in the US. It will accomplish the same sort of freeze in the growth of innovative small businesses but look far less insane.
  
  Parent Share
  twitter facebook
  - Re:Competition? (Score:4, Interesting)
    
    by Halfbaked Plan ( 769830 ) writes: on Sunday October 09, 2005 @07:00PM (#13753279)
    
    Oh, come on. You're talking about a company that is mostly an advertising enterprise now. Who is Google hiring? Admen and their ilk. It's sometimes depressing how enamored the 'community' had become in a company whose main purpose is leveraging eyeballs to look at their ads.
    
    (how DARE I say anything bad about Google. Mod this down IMMEDIATELY.)
    
    Parent Share
    twitter facebook
- Re:Competition? (Score:2)
  
  by rm999 ( 775449 ) writes:
  
  you forget that storage capacity increases exponentially with time. In 30 years we will have 15,000-20,000 terabyte drives (assuming that capacity doubles every two years and current drives are 500 gb). At that point we could easily hold 5 million tb of information.
  - Re:Competition? (Score:2)
    
    by Tony Hoyle ( 11698 ) writes:
    
    However at that point there will be 5 billion tb of information to index.
    
    Google will never finish their task, because the amount of information in the world (or at least raw data.. not all of it is really information) is increasing much faster than it's possible to index it.
    - Re:Competition? (Score:2)
      
      by rm999 ( 775449 ) writes:
      
      I agree that they will never be able to complete this task, because it is ridiculously impossible. how are they going to get their hands on an ancient chinese text whose only copy is in some guys attic in bejing?
      
      But assuming the problem is indexing 5 Mtb of information that is already available on the internet (songs, movies, whatever - just not things that haven't been digitized because this introduces the need to have humans working) it will be possible in the next 50 years. Both computation power and st
- Re:Competition? (Score:2)
  
  by argStyopa ( 232550 ) writes:
  
  I know, then lets internationalize the internet by handing over control to the UN.
  
  I'm *positive* these moves would be at least equally beneficial to society as a whole. They should be done together!
  
  (w00t for Stalinist centralization and the power of the state! Go MAO!)
- False alternative (Score:3, Insightful)
  
  by ChrisMaple ( 607946 ) writes:
  
  Private ownership of an airport does not mean that it would be owned by an airline. Even if an airport were owned by an airline, that does not mean it would serve only that airline. (It would not be in its best interest to do so.)
  Practice has shown that government ownership and operation of airports is inferior to private ownership.
- - Re:Competition? (Score:2)
    
    by claes ( 25551 ) writes:
    
    Indexing all the world's information would be kind of stealing too.
I'm curious... (Score:2, Interesting)

by DeepBlueDay ( 704464 ) writes:

How is 'information' defined in this context? Is a thirteen-year-old girl's blog considered information?
- Re:I'm curious... (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  The blogs of thirteen-year-old girls are examples of the recently discovered negative information [physorg.com]. If more young girls can be encouraged to write this will actually reduce Google's workload.
- Re:I'm curious... (Score:3, Insightful)
  
  by Hogwash McFly ( 678207 ) writes:
  
  Only if it includes her home address.
- Re:I'm curious... (Score:5, Insightful)
  
  by vidarh ( 309115 ) writes: <vidar@hokstad.com> on Sunday October 09, 2005 @05:37PM (#13752887) Homepage Journal
  
  I take it from that comment that you don't see much value in a thirteen year old girl's blog? What about a thirteen year old girls diary?
  Like Anne Frank's [wikipedia.org]?
  Fact is, it's incredibly hard to determine today what will have value tomorrow. Most of those thirteen year old girls (or 20-something geek guys) blogs will have no historical value. But some of those people will grow up to have a profound impact on the world (or they may not grow up, but still have a profound impact, as was the case with Anne Frank). It may be ten years from now. Or 50.
  Who knows what the writing they do now might tell us about what brought them wherever they end up? When people write diaries on paper chances are reasonable they'll survive and show up in an attic somewhere. But as more and more content get online, we also risk facing the loss of entire generations worth of many types of information to bit rot and simple lack of foresight.
  
  Parent Share
  twitter facebook
  - Re:I'm curious... (Score:5, Interesting)
    
    by Vellmont ( 569020 ) writes: on Sunday October 09, 2005 @07:50PM (#13753483) Homepage
    
    I think the parents question is perfectly valid. What is considered "information"? I'd consider a blog information, but is a painting some random artist creates included in this list of "information"? Is my laundry list information? How about my individual handwriting in my laundy list?
    
    The question of is something valuable isn't exactly an either-or proposition, but a matter of assigning a probability that a certain piece of information is valuable. Couldn't we agree that say the presidents day to day activities are more likely to be important in 100 years than say a single 13 year olds blog? Does that mean that 13 year olds blogs are worthless? Well no, but they aren't the thing I'd first choose to preserve.
    
    The question I have is, is the greater difficulty in control over online information balanced by the greater ease of keeping it around? Google doesn't delete messages from email for this very reason. We tend to throw stuff away because it takes up too much space, or because it just becomes clutter. But with increased storage space every year and better ability to keep track of it (and seperate it from things we consider important), why ever throw away information?
    
    Online information portability is obviously a problem. How do you move someones blog somewhere else, and have it mean anything in say 50 years? I think these problems will be solved as people expect information to be more portable and standardized. The solutions I think will come from the short term portability and needs rather than a few people wanting to preserve something for the next 100 years though. Many people make the assumption that standards are short lived things that are here today, gone tommorow. I'd have to disagree on a historical basis. How old are reel to reel tapes, and you can still find a player at say a thrift store. CD-audio has been around for 25 years and is still the default medium for music today. Ascii has developed I don't know how long ago and yet still is quite popular and if you have a computer that can't read it, you've got a fairly useless computer. Standards have a way of sno-balling and gathering momentum to live on a long time.
    
    Parent Share
    twitter facebook
    - Re:I'm curious... (Score:2)
      
      by Lucractius ( 649116 ) writes:
      
      with the capricious nature of art. id say that random painting has a likely chance of one day being worth a bucket load of money.
      
      So wouldnt it be nice to have its entire history on file!
copyright? (Score:2)

by bcrowell ( 177657 ) writes:

Hmm...so he's assuming the U.S. government will let 1923 copyrights expire some time before 2305 AD? Or has Google, after emasculating Google Print in response to lawsuits, suddenly found a new legal strategy?
300 Years? Feed Those Pigeons! (Score:5, Funny)

by Comatose51 ( 687974 ) writes: on Sunday October 09, 2005 @04:47PM (#13752621) Homepage

Obviously they're not feeding those pigeons [google.com] enough. Time to buy some quality feeds Google. Maybe even slip in some uppers every now and then. If all else fails, maybe it's time to consider the parrot upgrade. They're a lot more expensive but their index/poop ratio is much better.

Share
twitter facebook
- Re:300 Years? Feed Those Pigeons! (Score:2)
  
  by Zardus ( 464755 ) writes:
  
  But without the poop, where will they get the materials for their page background???
what is considered information? (Score:4, Insightful)

by bwy ( 726112 ) writes: on Sunday October 09, 2005 @04:48PM (#13752624)

I'd like to see their definition of information. Certainly, a lot of things that are already of common interest are on the net. Occasionally, I find things that aren't available online but the greatest majority of the time google is able to find what I want.

To further the example: at work we have several filing cabinets that haven't been opened in years. There are lots of papers and stuff in there, I can vouch for that. Some might consider it "information." But in reality all that stuff could be burned and I doubt it would make the slightest difference in the way the future rolls out. None of it is stuff that would ever be needed by an IRS audit or anything like that either. Does google consider this kind of stuff as part of their efforts? Because I think they can safely ignore it.

Share
twitter facebook
- it's the definition of "index" that's a problem (Score:3, Insightful)
  
  by Quadraginta ( 902985 ) writes:
  
  Look, the problem is not how much data there is in the world, the problem is finding a general automatable algorithm for organizing it in such a way that J. Random User can rapidly find what he's looking for.
  
  Stroll on down to the nearest university library. It's got a lot less information in it that Google is considering, and aboutt a hundred thousand man-years over a few centuries have gone into finding clever ways to organize it all: card catalogs, shelving systems (e.g. Dewey and his decimals), nowadays
I Call Bullshit (Score:3, Funny)

by Anonymous Coward writes: on Sunday October 09, 2005 @04:49PM (#13752633)

It's going to take 300 years to index the grammer and spelling mistakes on Slashdot alone.

Share
twitter facebook
- Re:I Call Bullshit (Score:2)
  
  by Hogwash McFly ( 678207 ) writes:
  
  Quite.
- Re:I Call Bullshit (Score:3, Informative)
  
  by TRS80NT ( 695421 ) writes:
  
  It's grammar.
Won't happen... (Score:2)

by mbone ( 558574 ) writes:

I don't think they'll ever get there. I would argue that the exponential increase of information and Google's indexing are both paced by Moore's law. As Google's indexing ability increases, the amount of information created will also increase, at least as fast, and probably faster. Unless you believe (and, who knows, the Googlites might, and that may be where the 300 years comes from) that Google will expand to fill the human universe, then Google will be able to keep up, never really get ahead.
Makes no sense (Score:5, Insightful)

by bobintetley ( 643462 ) writes: on Sunday October 09, 2005 @04:52PM (#13752658)

We did a math exercise? What exercise?

To estimate the time involved, you surely need to know the size of the information involved (don't quote me that bunkum about 170 terabytes in TFA - yes I did read it), and to know the size you need to know what all the information is, which you can't (and surely new information is created all the time?).

This translates as "I pulled my finger out my ass, waved it in the air and came up with 300 years."

Share
twitter facebook
- My guess: (Score:4, Informative)
  
  by imsabbel ( 611519 ) writes: on Sunday October 09, 2005 @06:23PM (#13753121)
  
  Stuff like this (or years ago for LHC) is most likely following approach:
  
  They astimated an amount of information that is "all information", like 480 000 Exabyte or so.
  Then they look at their current capactity (storage and database cpupower) and just interpolate moore's law into the future and look when the demand will be met.
  
  Of course, for stuff like the LHC that only interpolates 10-20 years into the future such a thing is possible, but 300 years? He should read up about the singularity...
  
  Parent Share
  twitter facebook
  - Re:My guess: (Score:2)
    
    by TeknoHog ( 164938 ) writes:
    
    This is extrapolation, not interpolation.
    Interpolation means looking between your data points, and extrapolation means looking outside the data set. Interpolation is generally much more reliable and trivial than extrapolation. In particular, when you're dealing with a time series of data, it's easy to spot a trend in past events (e.g. Moore's "law"), but harder to predict whether that trend continues.
  - Re:My guess: (Score:2)
    
    by Punboy ( 737239 ) writes:
    
    Actually, I would assume they aren't using Moore's Law since it only applies to transistor technology. They probably take the current rate of expansion (represented as a mathematical equation) and project into the future.
And the Winner is... (Score:2)

by bitspotter ( 455598 ) writes:

300 years to index all the info extant *today*?

So how much more information will exist by then? Is it growing faster than Google can index it? Think about how fast that much information came into being in the first place.

What a silly question.
- Re:And the Winner is... (Score:2, Funny)
  
  by Fermatprime ( 883412 ) writes:
  
  Well, it's just Zeno's paradox. Let's say it takes them 300 years to index all today's information, then another 150 to index all the new infomation, then another 75... By 2605, all information to that point will have been indexed by Google. Then they can start indexing the FUTURE.
- Re:And the Winner is... (Score:2)
  
  by Prof.Phreak ( 584152 ) writes:
  
  I think that's the point behind 300 years. It will take them this long to 1) index everything that's there, and be able to keep up with newly created information.
  
  (ie: in one of their papers they said something along the lines that the rate at which humans can produce information isn't growing as fast as their ability to index it, and eventually they'll be able to keep up with the rate of all information everyone on the planet produces. I guess that's where that 300 years comes in. I'd imagine it would take
Now the real question is.... (Score:2)

by SensitiveMale ( 155605 ) writes:

As we go into the future will the time to index

1) Take longer because more info is created faster then the ability to index it?
2) Take less time because processors, storage, and databases get faster?
3) Take the same amount of time because data and and the ability to index it grow at the same rate?
- Re:Now the real question is.... (Score:2)
  
  by Tony Hoyle ( 11698 ) writes:
  
  I suspect 1.
  
  Every time google indexes a piece of information, you have a new piece of information (the fact that google has indexed it), so the amount of information to index has not decreased, it has remained the same & merely changed form.
  
  There is also other information (eg. the amount of information that google has indexed, the amount of free space they have left) that is constantly changing and requires reindexing. Unfortunately storing that information causes it to change... quantum information an
Keyword is "them" (Score:2)

by gmuslera ( 3436 ) writes:

What about splitting the task and combining the resulting information between competing companies? Different search engines have a lot in common (in indexed pages) but also a lot that is convered by one but not by the others.
Of course, not all info is in the web, nor all info in the web is accessable by search engines, and even not all info accesable by search engines can be searched (think in graphics with text, i.e. just scanned books, or flash presentations with the actual content), but still that numb
webcams and other continuous data collectors (Score:4, Interesting)

by G4from128k ( 686170 ) writes: on Sunday October 09, 2005 @05:21PM (#13752795)

This analysis must exclude entire categories of continuous data collection devices such as webcams, data loggers, OS log files, sensing equipment etc. All jokes aside about porn on webcam's, I can imagine that future historian would love such a rich data source on how people lived their lives, what they have in their surroundings, etc.
The point is that many current systems spew a huge volume of low value (but nonzero value) data (multiple MB or GB/day/device). The lack of storage means most of this is not captured and is thus never indexed.
Even massive companies can't keep all their data. Wal-Mart stores on the order of 460 TB in their data warehouse, but only has room for the last 13 months of data or so. At 138 million customers per week, they only have room for a paltry 59kB per customer per week.

Share
twitter facebook
- Re:webcams and other continuous data collectors (Score:2)
  
  by Eivind ( 15695 ) writes:
  
  They could trivially store more if they wanted to, it must just be that there's no ROI on storing online data older than a year or so.
  You can buy a terabyte of harddisk-capacity for like $500 these days. So 500TB, while sounding impressive is actually disks for $250.000. Storage requires more than just the raw disks. A rule of thumb that often works out ok is to add an order of magnitude to convert "raw disc" into "enterprise storage", that still mean a cost in the ballpark of 2-3 million.
  Assuming your
a small margin of error (Score:3, Informative)

by CupBeEmpty ( 720791 ) writes: on Sunday October 09, 2005 @05:22PM (#13752805)

I think it is important to remember that this was a math exercise not a serious study with predictive power. I remember several years ago people thought the human genome project was insane. They thought it would take hundreds of years to catalog our entire genome and cost some ludicrous numbers of trillions of dollars.
Then:

In 1999, the goal of producing a "working draft" seemed very far away, with less than 15 percent of the genome sequenced. If the accelerated goals had not already generated a sense of urgency in the consortium, a decision by the sequencing center leaders at a February meeting in Houston would. At the meeting, the leaders accepted Dr. Collins' challenge to ramp up their efforts to produce a "working draft" by spring of 2000. By January 2000, the centers were collectively producing 1,000 base pairs a second, 24 hours a day, seven days a week, and 2 billion of the human genome's 3 billion base pairs were sequenced by March. At a White House ceremony hosted by President Bill Clinton in June 2000, Dr. Collins and J. Craig Venter of Celera Genomics, which had carried out its own sequencing strategy, announced that the majority of the human genome had been sequenced. [from here [nih.gov]

I tried to find the graph of speed over time because I have seen itseveral times. It shows the exponential increase in the speed of the project. Apparently there are many scientists that believe with techniques as they are now we could repeat the project in 2 years if we started over. The indexing of information could have a very similar timeline. Very slowly at first and then as technology and specific methodology develop off you go. So the truth is... this is a guess. I wouldn't put too much faith in it.

Share
twitter facebook
- Re:a small margin of error (Score:2)
  
  by Prof.Phreak ( 584152 ) writes:
  
  Funny thing about the human genome project is that they'll never be done. I'd imagine that every few years they'll have announcements saying "we're completed the hgp, etc.", but in truth, they may only be approaching 99.9% done.
  
  It's really one of those things where 95% of the work is relatively easy (well, `several year's worth of work'), and the other 5% are what takes the rest of the century.
  
  (there's also the question of -understanding- it as opposed to just writing it out to a file; and I'm sure that wil
- - Re:a small margin of error (Score:2)
    
    by AxelBoldt ( 1490 ) writes:
    
    Rather than spidering websites, which is what they have done so far, they will have to scan real world books. So data about past growth cannot be used to estimate future growth; it's not even clear that scanning technology will grow exponentially.
  - Re:a small margin of error (Score:2)
    
    by Rick Genter ( 315800 ) writes:
    
    You're assuming that the amount of indexable information stays constant over time. It does not. As others have argued, the index itself is indexable.
    
    Assume your equation is roughly correct however. We should be able to compute how much information would need to be indexed for it to take 300 years. 1TB * (e ** (300 yrs / 1.168 yrs)) = 3.533 * 10 ** 111 TB.
    
    I gotta go buy stock in DVD-R companies ;-).
was he joking ? (Score:5, Insightful)

by flynt ( 248848 ) writes: on Sunday October 09, 2005 @05:27PM (#13752840)

"We did a math exercise and the answer was 300 years," Schmidt said in response to an audience question asking for a projection of how long the company's mission will take. "The answer is it's going to be a very long time."

Since this was in response to an audience member's question, does anyone else think he was joking? Because it is such an outlandish question from an information theory and modeling point of view, perhaps he was mocking it? "Ah yes, we just came up with an equation and it should take 294.59 years." I think this also makes sense in light of his next comment, which was made on a more serious note. I interpret it, "We really didn't use an equation, it will obviously take a long time though." This is how I understod his comments, and I may be wrong, but it wouldn't surprise me if some reporter picked up on this "joke" and put it up as "news".

Share
twitter facebook
- Re:was he joking ? (Score:2)
  
  by The-Pheon ( 65392 ) writes:
  
  flynt you are my hero!
  ----------
  Dynamic DNS [thatip.com] from ThatIP
  - Re:was he joking ? (Score:2)
    
    by The-Pheon ( 65392 ) writes:
    
    And Flynt, learning how to brush your teeth with a manual brush will be indispensible after the revolution.
    ------------
    Dynamic DNS [thatip.com] from ThatIP
- - Re:was he joking ? (Score:2)
    
    by khallow ( 566160 ) writes:
    
    Don't forget an exponentially increasing ability to process data. Though we're also ignoring that the data in question is probably growing at a similar rate. So somewhere between 300 years and never is probably the correct answer. :-)
from a logical point of view (Score:2, Funny)

by ronsta ( 815765 ) writes:

Please correct me if I am wrong, but wouldn't solving a problem like this create more information than you previously had? And wouldn't you have to index that information and so on and so forth?
Also, can someone explain to me how you even approach something like this from a mathematical model point of view? How did the 170 terrabyte number even come up? Aren't there different definitions for what constitutes 'information?' Also, who the hell spent their 20% on this problem when there was integral code for
Naturally That Isn't True (Score:2)

by Master of Transhuman ( 597628 ) writes:

With the development of nanotech-based computers, that time will be cut to probably less than fifty.

The real problem of course is getting access to the stuff that isn't digital already. Still, nanotech will probably enable more effective scanners in the next fifty years as well. Rather than using once or twice removed stuff like lasers and IR beams, mechanical scanners will actually crawl the object to be scanned - meaning that anything will be scannable, including three-dimensional objects eventually.

Waste
Define "all" ... (Score:2, Insightful)

by jabberwock ( 10206 ) writes:

Noted that:
"Of the approximately 5 million terabytes of information out in the world, only about 170 terabytes have been indexed, he (Schmidt) said earlier during his speech."
So ... how many terrabytes of info will be produced in the next 300 years, and does anyone really think that Google (and anyone, or everyone) could keep up?
Especially, once all 20 billion people who live in the Solar System are video-documenting every moment of their existence ...
OK, so I project and exaggerate ...
Scientific and Mathematical bunk (Score:2)

by markdj ( 691222 ) writes:

The article gives us no facts that we can use to verify the claim. Without a definition of information and a definition of indexing one cannot take this for accurate. There are many definitions of information and except that used in "Information Theory", which is a message received and decoded to its original form, I don't know of any definition that has sientific or mathematical rigour. In fact, in my opinion, Information Theory is a misnomer and is more properly called Communication Theory since it is abo
Uh oh (Score:3, Funny)

by harlows_monkeys ( 106428 ) writes: on Sunday October 09, 2005 @07:18PM (#13753344) Homepage

Uh oh...someone needs to visit Applied Cryogenics and knock 700 years off Fry's timer then.

Share
twitter facebook
I thought the answer was 42. (Score:4, Funny)

by wcrowe ( 94389 ) writes: on Sunday October 09, 2005 @08:49PM (#13753725)

Ask a stupid question...

Share
twitter facebook
Wouldn't it be easier (Score:2)

by Allnighterking ( 74212 ) writes:

just to digitize the card catalog at the library of congress?
Indexing the Porn (Score:2, Funny)

by ozTravman ( 898206 ) writes:

He didn't clarify that 299 years of that was indexing all the Internet porn sites.
Many problems with this comments (Score:2)

by LS ( 57954 ) writes:

As an earlier poster said, this might be a joke, and probably is considering who's mouth it came out of. There are serious issues with this. First, what is considered information? You could consider the approximate location of every molecule in the universe a piece of information. Perhaps just text could be considered - that would narrow down the field of information into the finite. But Google does mapping, so they already have broken out of just text. Then you have the fact that the amount of inform
Fortunately... (Score:2)

by nathanh ( 1214 ) writes:

... once Google has completed its 300 year mission, it will reduce the Empire's fall to barbarism from 30,000 years to a mere 1,000 years.
search (Score:2)

by nut ( 19435 ) writes:

What I want to know is... How long will it take to search the whole world's information?
- Re:The major question is (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  I agree. At Google's scale and beyond, the concept of 'information' is such a wooly one.
  
  How the hell did they come to that figure of 300 years?
  - Re:The major question is (Score:3, Informative)
    
    by jupiter909 ( 786596 ) writes:
    
    They take the rate of current indexing of data, then take the rate at which data is being added to the pile by looking at current trends and possible future trends of people hooking up to the net and adding to that pile, then take the rate at which their systems advance to do the indexing of that pile. They then pass those variables through a custom magic google app and wait a bit and then, tada, the answer 300 is spat out.
    
    You need remember that they could be way off, if some major breakthrough in storage t
    - Re:The major question is (Score:2)
      
      by CreatureComfort ( 741652 ) * writes:
      
      But remember, that magical google app is still in Beta.
- Re:The major question is (Score:2, Interesting)
  
  by michaeltoe ( 651785 ) writes:
  
  Because once it's all there, you don't have to look for it anymore.
- Re:The major question is (Score:3, Funny)
  
  by NickFitz ( 5849 ) writes:
  
  Seriously, though, why would anyone want to index all the info in the world? That's kinda weird, in my opinion.
  
  New here?
- Re:The major question is (Score:3, Funny)
  
  by Kevin Mitnick ( 324809 ) writes:
  
  They're almost there [randz.net]
- Re:10,000,000 years (Score:3, Insightful)
  
  by Hogwash McFly ( 678207 ) writes:
  
  What question is that? What happens inside a woman's head?
  - Re:10,000,000 years (Score:2)
    
    by Billly Gates ( 198444 ) writes:
    
    If you don't know, then they are not going to tell you.
- - Re:10,000,000 years (Score:3, Funny)
    
    by eonlabs ( 921625 ) writes:
    
    Hey, check it out! I'm in the lead!!!! ^_^
- Re:i hereby propose (Score:5, Funny)
  
  by b100dian ( 771163 ) writes: on Sunday October 09, 2005 @05:33PM (#13752862) Homepage Journal
  
  ...Google indexed it all in 6 days, and took a rest in the 7th...
  
  Parent Share
  twitter facebook
- - Re:300 years from now? (Score:2)
    
    by BlueCodeWarrior ( 638065 ) writes:
    
    Yes, thank you. Now it can be indexed.
- Re:google in 300 years (Score:2)
  
  by telstar ( 236404 ) writes:
  
  In 2305? USB 4.0? More like US-what?
- Re:How much of that would be - (Score:2)
  
  by fbjon ( 692006 ) writes:
  
  Their location can be found on local.google.com. Colors and other specifics on various spots can be found on maps.google.com. Buy the classic Google Brand Underpants on Google Wear at wear.google.com. See it in 3D in Google Panties! Buy pro version for the women's version add-on! Use Google Sets to get the whole wardrobe!!
  M'kay, time to go to sleep..
- - Re:It will take... (Score:2)
    
    by fbjon ( 692006 ) writes:
    
    I conclude that the average information content of the entire universe is roughly zero, with a more or less uniform distribution. I have already indexed all of this on a blank piece of paper in my pocket. Do we really need more precision than that?
- Re:300 years? No way! (Score:2)
  
  by Tony Hoyle ( 11698 ) writes:
  
  It's easy to do millions of hours work without it taking millions of years.
  
  1 million hours work done sequentially only takes 114 years. Divide that by only 1000 processors (and there are commercial systems with more than that) and you're down to months.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Longer than expected (Score:5, Funny)

Re:Longer than expected (Score:3, Funny)

Re:Longer than expected (Score:3, Interesting)

Re:Longer than expected (Score:2)

New hardware needed (Score:5, Funny)

There are a lot of areas where essentially this... (Score:2, Interesting)

Re:New hardware needed (Score:5, Funny)

What About... (Score:5, Insightful)

Re:What About... (Score:3, Insightful)

Re:What About... (Score:5, Interesting)

Besides... (Score:2)

Re:What About... (Score:2, Insightful)

Re:What About... (Score:3, Interesting)

But... (Score:2)

Not the Moore model but the Bono model (Score:3, Interesting)

Re:But... (Score:3, Insightful)

300 years? (Score:5, Funny)

I'd like my house indexed (Score:5, Funny)

Maybe you're thinking about this? (Score:2)

Re:I'd like my house indexed (Score:5, Funny)

Re:I'd like my house indexed (Score:3, Funny)

Oooo.... No it's the giant brains all over again (Score:2)

Re:Oooo.... No it's the giant brains all over agai (Score:3, Funny)

Everybody! (Score:5, Funny)

Re:Everybody! (Score:2)

Yeah right.. (Score:5, Funny)

When I read the summary (Score:5, Funny)

On a related note... (Score:5, Interesting)

Re:On a related note... (Score:2)

Re:On a related note... (Score:2)

Competition? (Score:4, Interesting)

Re:Competition? (Score:5, Insightful)

Re:Competition? (Score:4, Interesting)

Re:Competition? (Score:2)

Re:Competition? (Score:2)

Re:Competition? (Score:2)

Re:Competition? (Score:2)

False alternative (Score:3, Insightful)

Re:Competition? (Score:2)

I'm curious... (Score:2, Interesting)

Re:I'm curious... (Score:2, Funny)

Re:I'm curious... (Score:3, Insightful)

Re:I'm curious... (Score:5, Insightful)

Re:I'm curious... (Score:5, Interesting)

Re:I'm curious... (Score:2)

copyright? (Score:2)

300 Years? Feed Those Pigeons! (Score:5, Funny)

Re:300 Years? Feed Those Pigeons! (Score:2)

what is considered information? (Score:4, Insightful)

it's the definition of "index" that's a problem (Score:3, Insightful)

I Call Bullshit (Score:3, Funny)

Re:I Call Bullshit (Score:2)

Re:I Call Bullshit (Score:3, Informative)

Won't happen... (Score:2)

Makes no sense (Score:5, Insightful)

My guess: (Score:4, Informative)

Re:My guess: (Score:2)

Re:My guess: (Score:2)

And the Winner is... (Score:2)

Re:And the Winner is... (Score:2, Funny)

Re:And the Winner is... (Score:2)

Now the real question is.... (Score:2)

Re:Now the real question is.... (Score:2)

Keyword is "them" (Score:2)

webcams and other continuous data collectors (Score:4, Interesting)

Re:webcams and other continuous data collectors (Score:2)

a small margin of error (Score:3, Informative)

Re:a small margin of error (Score:2)

Re:a small margin of error (Score:2)

Re:a small margin of error (Score:2)

was he joking ? (Score:5, Insightful)

Re:was he joking ? (Score:2)

Re:was he joking ? (Score:2)

Re:was he joking ? (Score:2)

from a logical point of view (Score:2, Funny)

Naturally That Isn't True (Score:2)

Define "all" ... (Score:2, Insightful)

Scientific and Mathematical bunk (Score:2)

Uh oh (Score:3, Funny)

I thought the answer was 42. (Score:4, Funny)