Google To Offer Free Database Storage for Scientists 107
An anonymous reader writes "Google has revealed a new project aimed at the scientific community. Called Palimpsest, the site research.google.com will play host to 'terabytes of open-source scientific datasets'. It was originally previewed for scientists last August . 'Building on the company's acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features.'"
Re:Fantastic for Students and New Researchers (Score:5, Insightful)
Isn't this information more likely to be capitalized upon by those who already dominate the commercialization of research?
Yes, noobs would have enormous amounts of raw material at their disposal, but wouldn't they find applications derived from this data already covered by patents that were distilled from the data sets through analysis performed by labs full of trained corporate monkeys before they can get their own foot in the door of innovation?
I would love to awaken one day and find that I am just being a jaded fool, but I believe developments like this will help the commercialized overlords more than anyone else as they are the ones with sufficient resources to throw at privatizing the results of scientific research.
It'll All End In Tears (Score:5, Insightful)
This is a Bad Idea. Too much of the world now depends on Google. And people are running to Google, willing to give their data and identity.
/me shakes walking stick and creeps back into cave.
Horrible Idea - What are the TOS? (Score:5, Insightful)
Re:Fantastic for Students and New Researchers (Score:4, Insightful)
You cannot patent mere data, or interpretations of data. Patents are for machines, processes, and the like. Of course, the publication of data doesn't preclude people from patenting a chemical process that results in a specific gene, but this is already happening elsewhere.
In fact, I suspect the entire point of this is for Google to take over maintenance of the Genomic Databases and create new such databases. Many times the academic databases are.. poorly maintained, and certainly not compatible, despite the very similar contents. There's already efforts to make them more compatible, but Google appears to be able to offer some very neat stuff on top of it all. The silliness about shipping RAID arrays mostly seems to be for unis not already hooked up to I2.
Re:Fantastic for Students and New Researchers (Score:5, Insightful)
Re:mining for ads (Score:2, Insightful)
Re:Fantastic for Students and New Researchers (Score:5, Insightful)
1) trivially, 3TB is no where near enough to store my data
Bit of a non issue for the overall concept but if google wants my data, they really are going to have to up the storage by a few orders of magnitude.
2) as others stated, we work really really hard to acquire our data, research is about 10% inspiration, 90% perspiration. We are not giving up our data till we have milked it for all its worth.
This again is solvable, we release our data after we have all the publishable results we can think of and them let others have a crack. Somebody might find something useful and if not, well its great for younger scientists as you say. At the very least, people can reconfirm results at a later date easier. Main reason I like it.
3) The deal killer, for my field and I suspect others, it is really really difficult to understand our data and its really easy to misinterpret it.
New particles have been "discovered" so many times by grad students (and some professors who should know better) in particle physics data that I'm terrified of what somebody with no training outside the system might conclude from the data. At CDF (a fermilab expt) it took us (800 physicists) about 2-3 years to understand the data from the experiment enough to get proper physics results out of it. Even now, it takes a new comer about a year to get upto speed and thats with help from all the experts. But its very easy to think you understand things after a few weeks when infact your missing some incredibly subtle point and so I'm sure we would be flooded by bogus results due to misinterpretations from the data if we release it.
Anyway this all comes from a particle physics view point but I suspect quite a few other fields will be similar.
Re:Fantastic for Students and New Researchers (Score:4, Insightful)
But in that case, would you want to go anywhere close to someone else's data, for the risk of "contaminating" your research and perhaps end up in a protracted brawl over discovery rights?
I mostly agree with everybody else: it's a neat idea but for a lot of people it's not going to fly.
The one area I think it could be good is for datasets that are already open and that are meant to be shared. In vision research, for instance, or in various fields in machine learning there's quite a lot of sort-of-standard test data sets created by various groups that can make it easier to compare models directly. Having all of those collected in one place would certainly make it easier to find and actually use them rather than reinventing the wheel once again.
Re:Fantastic for Students and New Researchers (Score:4, Insightful)
Re:Fantastic for Students and New Researchers (Score:2, Insightful)
Re:Fantastic for Students and New Researchers (Score:2, Insightful)
Yes, that's exactly the point. I am a physics student and the first thing that was told to us before we began our first lab course was: "Don't throw away any data! Even if you think it's unimportant, equipment failure, ...". New discoveries have been postponed for years because someone simply threw away data which seemed to be unimportant at this time. There's simply no way of telling if some data set is essential or not. If you're thinking this way, you should be more than satisfied with results alone.