300 Years to Index the World's Information 248
Kasracer writes "At the Association of National Advertisers annual conference, Google's CEO, Eric Schmidt suggested that it would take 300 years for them to index all of the world's information. From the article: 'We did a math exercise and the answer was 300 years,' Schmidt said in response to an audience question asking for a projection of how long the company's mission will take. 'The answer is it's going to be a very long time.'"
Longer than expected (Score:5, Funny)
Re:Longer than expected (Score:3, Funny)
Re:Longer than expected (Score:3, Interesting)
On a side note, since they are restricted to doing verbatum t
Re:Longer than expected (Score:2)
What? Not until next week?
New hardware needed (Score:5, Funny)
There are a lot of areas where essentially this... (Score:2, Interesting)
Re:New hardware needed (Score:5, Funny)
What About... (Score:5, Insightful)
Re:What About... (Score:3, Insightful)
Re:What About... (Score:5, Interesting)
Think of it in terms of taking a ratio comparison of two infinite series.
Besides... (Score:2)
Re:What About... (Score:2, Insightful)
Re:What About... (Score:3, Interesting)
Funny you mention that. In some versions of Superman, Brainiac, a living computer whose mission is to gather all information about every planet in the universe, entered into the world of villainry because he logically reasoned that the only way he could eve
But... (Score:2)
Not the Moore model but the Bono model (Score:3, Interesting)
No, the proper model is not Moore's law but Bono's law [pineight.com]. If it takes 300 years now, then it'll take 320 years in 20 years, and most of the time will be spent waiting for exclusive rights to expire (if they ever do). For instance, indexing a literary work that's out of print and not widely available at libraries requires getting a new copy, and those aren't available until the copyright runs out.
Re:But... (Score:3, Insightful)
-5 million TB of data.
-170 TB have already been indexed.
-it would take 300 years to index that data and make it searchable.
I don't think it's an exercise to index all knowledge. As you point out, that would be alogical. I think it's more of an understanding of what it would take to effectively and completely serve the world's information needs given current indexing capabilities.
I guess establishing a benchmark currently, both of how effic
300 years? (Score:5, Funny)
I'd like my house indexed (Score:5, Funny)
Maybe you're thinking about this? (Score:2)
Re:I'd like my house indexed (Score:5, Funny)
locate:phone | pocket
locate:underwear -girlfriend | rm
Re:I'd like my house indexed (Score:3, Funny)
Oooo.... No it's the giant brains all over again (Score:2)
Re:Oooo.... No it's the giant brains all over agai (Score:3, Funny)
Everybody! (Score:5, Funny)
Re:Everybody! (Score:2)
Yeah right.. (Score:5, Funny)
When I read the summary (Score:5, Funny)
Googlesphere anyone?
On a related note... (Score:5, Interesting)
It'd be interesting, if, perhaps in a couple generations, we could have a cheap media volume that contained "recorded media, prehistory - to - 2050ad"... if the media that exists today even survives a couple generations, and copyrights aren't extended indefinetly. The idea of an indexing system that can even put all that information into a meaningful context would be fascinating to consider though, if it could be possible.
Ryan Fenton
Re:On a related note... (Score:2)
This is indeed possible. If I think back 20 years, and how much information was digitized vs. today, it is just plain staggering.
Think about Britannica, Project Gutenberg, CCEL, Perseus, Alwarraq, and many thousands of other sites that offer reference works in digital form. Think about the commerical CD-ROMs that have that.
All that in just 20 years.
So, 100 more years, we can have all the heiroglyphs off of Egyptian temple
Re:On a related note... (Score:2)
Probably a lot longer than you think.
"Goddammit, I can't watch or listen to any more of this crap! I'm going out for a cup of coffee and a smoke."
Competition? (Score:4, Interesting)
Re:Competition? (Score:5, Insightful)
I have a better idea, how about you just send out a government hit squad to kill to put a bullet between the eyes of single entrepreneur in the US. It will accomplish the same sort of freeze in the growth of innovative small businesses but look far less insane.
Re:Competition? (Score:4, Interesting)
(how DARE I say anything bad about Google. Mod this down IMMEDIATELY.)
Re:Competition? (Score:2)
Re:Competition? (Score:2)
Google will never finish their task, because the amount of information in the world (or at least raw data.. not all of it is really information) is increasing much faster than it's possible to index it.
Re:Competition? (Score:2)
But assuming the problem is indexing 5 Mtb of information that is already available on the internet (songs, movies, whatever - just not things that haven't been digitized because this introduces the need to have humans working) it will be possible in the next 50 years. Both computation power and st
Re:Competition? (Score:2)
I'm *positive* these moves would be at least equally beneficial to society as a whole. They should be done together!
(w00t for Stalinist centralization and the power of the state! Go MAO!)
False alternative (Score:3, Insightful)
Practice has shown that government ownership and operation of airports is inferior to private ownership.
Re:Competition? (Score:2)
I'm curious... (Score:2, Interesting)
Re:I'm curious... (Score:2, Funny)
Re:I'm curious... (Score:3, Insightful)
Re:I'm curious... (Score:5, Insightful)
Like Anne Frank's [wikipedia.org]?
Fact is, it's incredibly hard to determine today what will have value tomorrow. Most of those thirteen year old girls (or 20-something geek guys) blogs will have no historical value. But some of those people will grow up to have a profound impact on the world (or they may not grow up, but still have a profound impact, as was the case with Anne Frank). It may be ten years from now. Or 50.
Who knows what the writing they do now might tell us about what brought them wherever they end up? When people write diaries on paper chances are reasonable they'll survive and show up in an attic somewhere. But as more and more content get online, we also risk facing the loss of entire generations worth of many types of information to bit rot and simple lack of foresight.
Re:I'm curious... (Score:5, Interesting)
The question of is something valuable isn't exactly an either-or proposition, but a matter of assigning a probability that a certain piece of information is valuable. Couldn't we agree that say the presidents day to day activities are more likely to be important in 100 years than say a single 13 year olds blog? Does that mean that 13 year olds blogs are worthless? Well no, but they aren't the thing I'd first choose to preserve.
The question I have is, is the greater difficulty in control over online information balanced by the greater ease of keeping it around? Google doesn't delete messages from email for this very reason. We tend to throw stuff away because it takes up too much space, or because it just becomes clutter. But with increased storage space every year and better ability to keep track of it (and seperate it from things we consider important), why ever throw away information?
Online information portability is obviously a problem. How do you move someones blog somewhere else, and have it mean anything in say 50 years? I think these problems will be solved as people expect information to be more portable and standardized. The solutions I think will come from the short term portability and needs rather than a few people wanting to preserve something for the next 100 years though. Many people make the assumption that standards are short lived things that are here today, gone tommorow. I'd have to disagree on a historical basis. How old are reel to reel tapes, and you can still find a player at say a thrift store. CD-audio has been around for 25 years and is still the default medium for music today. Ascii has developed I don't know how long ago and yet still is quite popular and if you have a computer that can't read it, you've got a fairly useless computer. Standards have a way of sno-balling and gathering momentum to live on a long time.
Re:I'm curious... (Score:2)
So wouldnt it be nice to have its entire history on file!
copyright? (Score:2)
300 Years? Feed Those Pigeons! (Score:5, Funny)
Re:300 Years? Feed Those Pigeons! (Score:2)
what is considered information? (Score:4, Insightful)
To further the example: at work we have several filing cabinets that haven't been opened in years. There are lots of papers and stuff in there, I can vouch for that. Some might consider it "information." But in reality all that stuff could be burned and I doubt it would make the slightest difference in the way the future rolls out. None of it is stuff that would ever be needed by an IRS audit or anything like that either. Does google consider this kind of stuff as part of their efforts? Because I think they can safely ignore it.
it's the definition of "index" that's a problem (Score:3, Insightful)
Stroll on down to the nearest university library. It's got a lot less information in it that Google is considering, and aboutt a hundred thousand man-years over a few centuries have gone into finding clever ways to organize it all: card catalogs, shelving systems (e.g. Dewey and his decimals), nowadays
I Call Bullshit (Score:3, Funny)
Re:I Call Bullshit (Score:2)
Re:I Call Bullshit (Score:3, Informative)
Won't happen... (Score:2)
Makes no sense (Score:5, Insightful)
To estimate the time involved, you surely need to know the size of the information involved (don't quote me that bunkum about 170 terabytes in TFA - yes I did read it), and to know the size you need to know what all the information is, which you can't (and surely new information is created all the time?).
This translates as "I pulled my finger out my ass, waved it in the air and came up with 300 years."
My guess: (Score:4, Informative)
They astimated an amount of information that is "all information", like 480 000 Exabyte or so.
Then they look at their current capactity (storage and database cpupower) and just interpolate moore's law into the future and look when the demand will be met.
Of course, for stuff like the LHC that only interpolates 10-20 years into the future such a thing is possible, but 300 years? He should read up about the singularity...
Re:My guess: (Score:2)
Interpolation means looking between your data points, and extrapolation means looking outside the data set. Interpolation is generally much more reliable and trivial than extrapolation. In particular, when you're dealing with a time series of data, it's easy to spot a trend in past events (e.g. Moore's "law"), but harder to predict whether that trend continues.
Re:My guess: (Score:2)
And the Winner is... (Score:2)
So how much more information will exist by then? Is it growing faster than Google can index it? Think about how fast that much information came into being in the first place.
What a silly question.
Re:And the Winner is... (Score:2, Funny)
Re:And the Winner is... (Score:2)
(ie: in one of their papers they said something along the lines that the rate at which humans can produce information isn't growing as fast as their ability to index it, and eventually they'll be able to keep up with the rate of all information everyone on the planet produces. I guess that's where that 300 years comes in. I'd imagine it would take
Now the real question is.... (Score:2)
1) Take longer because more info is created faster then the ability to index it?
2) Take less time because processors, storage, and databases get faster?
3) Take the same amount of time because data and and the ability to index it grow at the same rate?
Re:Now the real question is.... (Score:2)
Every time google indexes a piece of information, you have a new piece of information (the fact that google has indexed it), so the amount of information to index has not decreased, it has remained the same & merely changed form.
There is also other information (eg. the amount of information that google has indexed, the amount of free space they have left) that is constantly changing and requires reindexing. Unfortunately storing that information causes it to change... quantum information an
Keyword is "them" (Score:2)
Of course, not all info is in the web, nor all info in the web is accessable by search engines, and even not all info accesable by search engines can be searched (think in graphics with text, i.e. just scanned books, or flash presentations with the actual content), but still that numb
webcams and other continuous data collectors (Score:4, Interesting)
The point is that many current systems spew a huge volume of low value (but nonzero value) data (multiple MB or GB/day/device). The lack of storage means most of this is not captured and is thus never indexed.
Even massive companies can't keep all their data. Wal-Mart stores on the order of 460 TB in their data warehouse, but only has room for the last 13 months of data or so. At 138 million customers per week, they only have room for a paltry 59kB per customer per week.
Re:webcams and other continuous data collectors (Score:2)
You can buy a terabyte of harddisk-capacity for like $500 these days. So 500TB, while sounding impressive is actually disks for $250.000. Storage requires more than just the raw disks. A rule of thumb that often works out ok is to add an order of magnitude to convert "raw disc" into "enterprise storage", that still mean a cost in the ballpark of 2-3 million.
Assuming your
a small margin of error (Score:3, Informative)
Then:
I tried to find the graph of speed over time because I have seen itseveral times. It shows the exponential increase in the speed of the project. Apparently there are many scientists that believe with techniques as they are now we could repeat the project in 2 years if we started over. The indexing of information could have a very similar timeline. Very slowly at first and then as technology and specific methodology develop off you go. So the truth is... this is a guess. I wouldn't put too much faith in it.
Re:a small margin of error (Score:2)
It's really one of those things where 95% of the work is relatively easy (well, `several year's worth of work'), and the other 5% are what takes the rest of the century.
(there's also the question of -understanding- it as opposed to just writing it out to a file; and I'm sure that wil
Re:a small margin of error (Score:2)
Re:a small margin of error (Score:2)
Assume your equation is roughly correct however. We should be able to compute how much information would need to be indexed for it to take 300 years. 1TB * (e ** (300 yrs / 1.168 yrs)) = 3.533 * 10 ** 111 TB.
I gotta go buy stock in DVD-R companies
was he joking ? (Score:5, Insightful)
Since this was in response to an audience member's question, does anyone else think he was joking? Because it is such an outlandish question from an information theory and modeling point of view, perhaps he was mocking it? "Ah yes, we just came up with an equation and it should take 294.59 years." I think this also makes sense in light of his next comment, which was made on a more serious note. I interpret it, "We really didn't use an equation, it will obviously take a long time though." This is how I understod his comments, and I may be wrong, but it wouldn't surprise me if some reporter picked up on this "joke" and put it up as "news".
Re:was he joking ? (Score:2)
----------
Dynamic DNS [thatip.com] from ThatIP
Re:was he joking ? (Score:2)
------------
Dynamic DNS [thatip.com] from ThatIP
Re:was he joking ? (Score:2)
from a logical point of view (Score:2, Funny)
Also, can someone explain to me how you even approach something like this from a mathematical model point of view? How did the 170 terrabyte number even come up? Aren't there different definitions for what constitutes 'information?' Also, who the hell spent their 20% on this problem when there was integral code for
Naturally That Isn't True (Score:2)
The real problem of course is getting access to the stuff that isn't digital already. Still, nanotech will probably enable more effective scanners in the next fifty years as well. Rather than using once or twice removed stuff like lasers and IR beams, mechanical scanners will actually crawl the object to be scanned - meaning that anything will be scannable, including three-dimensional objects eventually.
Waste
Define "all" ... (Score:2, Insightful)
"Of the approximately 5 million terabytes of information out in the world, only about 170 terabytes have been indexed, he (Schmidt) said earlier during his speech."
So ... how many terrabytes of info will be produced in the next 300 years, and does anyone really think that Google (and anyone, or everyone) could keep up?
Especially, once all 20 billion people who live in the Solar System are video-documenting every moment of their existence ...
OK, so I project and exaggerate ...
Scientific and Mathematical bunk (Score:2)
Uh oh (Score:3, Funny)
I thought the answer was 42. (Score:4, Funny)
Wouldn't it be easier (Score:2)
Indexing the Porn (Score:2, Funny)
Many problems with this comments (Score:2)
Fortunately... (Score:2)
search (Score:2)
Re:The major question is (Score:2, Insightful)
How the hell did they come to that figure of 300 years?
Re:The major question is (Score:3, Informative)
You need remember that they could be way off, if some major breakthrough in storage t
Re:The major question is (Score:2)
But remember, that magical google app is still in Beta.
Re:The major question is (Score:2, Interesting)
Re:The major question is (Score:3, Funny)
New here?
Re:The major question is (Score:3, Funny)
Re:10,000,000 years (Score:3, Insightful)
Re:10,000,000 years (Score:2)
Re:10,000,000 years (Score:3, Funny)
Re:i hereby propose (Score:5, Funny)
Re:300 years from now? (Score:2)
Re:google in 300 years (Score:2)
Re:How much of that would be - (Score:2)
M'kay, time to go to sleep..
Re:It will take... (Score:2)
Re:300 years? No way! (Score:2)
1 million hours work done sequentially only takes 114 years. Divide that by only 1000 processors (and there are commercial systems with more than that) and you're down to months.