Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×

Google To Offer Free Database Storage for Scientists 107

An anonymous reader writes "Google has revealed a new project aimed at the scientific community. Called Palimpsest, the site research.google.com will play host to 'terabytes of open-source scientific datasets'. It was originally previewed for scientists last August . 'Building on the company's acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features.'"
This discussion has been archived. No new comments can be posted.

Google To Offer Free Database Storage for Scientists

Comments Filter:
  • by ushering05401 ( 1086795 ) on Saturday January 19, 2008 @07:04PM (#22112904) Journal
    I feel your optimism, and support this idea, but the cynical side of me must speak out.

    Isn't this information more likely to be capitalized upon by those who already dominate the commercialization of research?

    Yes, noobs would have enormous amounts of raw material at their disposal, but wouldn't they find applications derived from this data already covered by patents that were distilled from the data sets through analysis performed by labs full of trained corporate monkeys before they can get their own foot in the door of innovation?

    I would love to awaken one day and find that I am just being a jaded fool, but I believe developments like this will help the commercialized overlords more than anyone else as they are the ones with sufficient resources to throw at privatizing the results of scientific research.
  • by turgid ( 580780 ) on Saturday January 19, 2008 @07:10PM (#22112940) Journal

    This is a Bad Idea. Too much of the world now depends on Google. And people are running to Google, willing to give their data and identity.

    /me shakes walking stick and creeps back into cave.

  • by teknopurge ( 199509 ) on Saturday January 19, 2008 @07:11PM (#22112946) Homepage
    Does google get ownership of anything that is uploaded? I wonder how foolish scientists will be as to unknowingly forfeit their copyrights, IP, etc.
  • by xenocide2 ( 231786 ) on Saturday January 19, 2008 @07:21PM (#22113032) Homepage

    Isn't this information more likely to be capitalized upon by those who already dominate the commercialization of research?
    Can't it be both? It's not like by subscribing you're depriving others. And the data uploaded will be made freely available.

    You cannot patent mere data, or interpretations of data. Patents are for machines, processes, and the like. Of course, the publication of data doesn't preclude people from patenting a chemical process that results in a specific gene, but this is already happening elsewhere.

    In fact, I suspect the entire point of this is for Google to take over maintenance of the Genomic Databases and create new such databases. Many times the academic databases are.. poorly maintained, and certainly not compatible, despite the very similar contents. There's already efforts to make them more compatible, but Google appears to be able to offer some very neat stuff on top of it all. The silliness about shipping RAID arrays mostly seems to be for unis not already hooked up to I2.

  • by cortex ( 168860 ) <neuraleng@gmail.com> on Saturday January 19, 2008 @07:22PM (#22113042)
    As a neural engineering researcher who routinely generates terabyte size datasets, I have to say that I both like this idea and think it is unlikely to succeed. I would love to have a place to store large datasets and access them from wherever I am at. However, since these datasets will be open sourced, I will be extremely unlikely to put any dataset on google until I am certain I have extracted all of the publishable findings from it. I think that most researchers after putting in years of effort and a lot money into acquiring a dataset will also think twice about open sourcing their data. If the TOS where to include some means for controlling publications which resulted from analysis of the data, then it might be more likely to succeed.
  • Re:mining for ads (Score:2, Insightful)

    by Anonymous Coward on Saturday January 19, 2008 @07:43PM (#22113170)
    This is more than likely "tweaked" by a savvy google employee. Think of it as the way "750 ml in shots" gives you the right answer. It's clever, but "it" didn't manage to do it; it was just some Google engineer's Friday project which made it to release, because google isn't entirely soulless yet.
  • by Gromius ( 677157 ) on Saturday January 19, 2008 @07:59PM (#22113286)
    As a researcher myself (particle physics), I echo others comments in this thread that a) its a nice idea but b) isnt going to happen. There are three main problems, the first two are solvable, the third isnt

    1) trivially, 3TB is no where near enough to store my data

    Bit of a non issue for the overall concept but if google wants my data, they really are going to have to up the storage by a few orders of magnitude.

    2) as others stated, we work really really hard to acquire our data, research is about 10% inspiration, 90% perspiration. We are not giving up our data till we have milked it for all its worth.

    This again is solvable, we release our data after we have all the publishable results we can think of and them let others have a crack. Somebody might find something useful and if not, well its great for younger scientists as you say. At the very least, people can reconfirm results at a later date easier. Main reason I like it.

    3) The deal killer, for my field and I suspect others, it is really really difficult to understand our data and its really easy to misinterpret it.

    New particles have been "discovered" so many times by grad students (and some professors who should know better) in particle physics data that I'm terrified of what somebody with no training outside the system might conclude from the data. At CDF (a fermilab expt) it took us (800 physicists) about 2-3 years to understand the data from the experiment enough to get proper physics results out of it. Even now, it takes a new comer about a year to get upto speed and thats with help from all the experts. But its very easy to think you understand things after a few weeks when infact your missing some incredibly subtle point and so I'm sure we would be flooded by bogus results due to misinterpretations from the data if we release it.

    Anyway this all comes from a particle physics view point but I suspect quite a few other fields will be similar.
  • by JanneM ( 7445 ) on Saturday January 19, 2008 @08:19PM (#22113414) Homepage
    If the TOS where to include some means for controlling publications which resulted from analysis of the data, then it might be more likely to succeed.

    But in that case, would you want to go anywhere close to someone else's data, for the risk of "contaminating" your research and perhaps end up in a protracted brawl over discovery rights?

    I mostly agree with everybody else: it's a neat idea but for a lot of people it's not going to fly.

    The one area I think it could be good is for datasets that are already open and that are meant to be shared. In vision research, for instance, or in various fields in machine learning there's quite a lot of sort-of-standard test data sets created by various groups that can make it easier to compare models directly. Having all of those collected in one place would certainly make it easier to find and actually use them rather than reinventing the wheel once again.

  • by CastrTroy ( 595695 ) on Saturday January 19, 2008 @09:40PM (#22113836)
    That's really weird that this appeared on Slashdot tonite, just as I was downloading the historical weather data [ec.gc.ca] for Canada. Still waiting for it to download. I was thinking that it would be a nice data set that would be interesting to work with. It's not a huge dataset by any means, only 200 MB zipped, but it's still bigger and more real than any of the stuff I got to use in university. And a lot larger than any real data set I could generate on my own. Does anybody else have any links to interesting open data sets?
  • by dogmod ( 702959 ) on Sunday January 20, 2008 @05:26AM (#22115838)
    Seems to me that each of these deal killers is a red herring. 1. My data set is too large = I have no idea what's essential. 2. I worked too hard to get this data, I'm not going to give it away = I'm a mediocre scientist competing against a lot of other mediocre scientists - this data might be my one chance to win the lottery. Oh, and just for the record, I don't really give a shit about the progress of my field - fuck 'em, fuck 'em all, me, me, me. 3. Newbies will misunderstand my data and pervert it = What the fuck am I posting on Slashdot for? As one of the mathematicians involved in computation of the Kazhdan-Lusztig-Vogan polynomials for E8, I say - they now exist, they're marvelous, and you're welcome to them. (I suspect you'll pass though - I would if I were in closer proximity to some other marvel.)
  • by tenco ( 773732 ) on Sunday January 20, 2008 @03:47PM (#22118942)
    1. My data set is too large = I have no idea what's essential.

    Yes, that's exactly the point. I am a physics student and the first thing that was told to us before we began our first lab course was: "Don't throw away any data! Even if you think it's unimportant, equipment failure, ...". New discoveries have been postponed for years because someone simply threw away data which seemed to be unimportant at this time. There's simply no way of telling if some data set is essential or not. If you're thinking this way, you should be more than satisfied with results alone.

"Experience has proved that some people indeed know everything." -- Russell Baker

Working...