Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×

Google To Offer Free Database Storage for Scientists 107

An anonymous reader writes "Google has revealed a new project aimed at the scientific community. Called Palimpsest, the site research.google.com will play host to 'terabytes of open-source scientific datasets'. It was originally previewed for scientists last August . 'Building on the company's acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features.'"
This discussion has been archived. No new comments can be posted.

Google To Offer Free Database Storage for Scientists

Comments Filter:
  • by cheesethegreat ( 132893 ) on Saturday January 19, 2008 @06:56PM (#22112840)
    If this actually happens, and researchers are willing to make their data-sets open source, it would be a huge boon for budding researchers. It would allow students to do more than just work with a sample dataset out of a textbook. Graduate students learning how to do advanced modeling would be able to work with real datasets, vastly improving their skillset and employability. Just consider these two lines on a CV, and ask yourself which one jumps out at you.

    "Designed a model for the dataset on the CD-ROM included with the Modeling Organic Systems textbook"

    "Designed a model for the WISK-III heart output dataset published in 2006."

    New entrants to a field would have instant access to enormous amounts of data very quickly and easily. Although the big kudos comes when you can do totally original work (new data, new analyss), a researcher who could come up with a new critique of older papers and studies would definitely get themselves noticed.

    Overall, this is a really positive step for everyone on the lower rungs of the scientific ladder, and especially positive for those with limited resources.
  • by hostguy2004 ( 818334 ) on Saturday January 19, 2008 @07:56PM (#22113272)
    Google are offering this service to store PUBLIC DOMAIN data. If people don't want to release the data as public domain, then this aint the service for them. See http://en.wikipedia.org/wiki/Public_Domain [wikipedia.org]
  • Re:mining for ads (Score:5, Informative)

    by mikael ( 484 ) on Saturday January 19, 2008 @08:58PM (#22113640)
    These are data sets that have already been placed in the public domain by the scientists. These could be astronomy images, multi-spectral image photography, remote satellite imagery, seismology recordings, MRI/NMR/CAT scans and many other types of volume, image and signal data.
  • by tomhudson ( 43916 ) <barbara,hudson&barbara-hudson,com> on Saturday January 19, 2008 @09:43PM (#22113854) Journal
    3 terrabytes isn't that much any more. You can get 750 GByte hard drives for $160 - 5 drives ($800) gives you your 3TB.

    Or 4 x 1TB hard drives ($180 ea) gives you $720, so throw in $10 to boot the os off a usb key.

    Cheap linux box? Well, you don't need to supply a monitor, keyboard, mouse, speakers, or even much ram - you do the math.

  • Re:Are they insane? (Score:3, Informative)

    by ryanov ( 193048 ) on Sunday January 20, 2008 @01:47AM (#22115090)
    I like my job, I'm a sysadmin, I'm on call right now, and I'm a committee chair for my union. Guess you don't know everything.
  • by Dominic_Mazzoni ( 125164 ) on Sunday January 20, 2008 @03:43AM (#22115530) Homepage
    Does anybody else have any links to interesting open data sets?

    My favorite: near-real-time medium-resolution satellite images from NASA: http://rapidfire.sci.gsfc.nasa.gov/ [nasa.gov]

  • by Gromius ( 677157 ) on Sunday January 20, 2008 @06:46AM (#22116026)
    Yes I can see how it can appear elitist. And yes it is elitist in a sense. Because its really hard. A PhD student typically has to do about 3 years hard work to get out an analysis sufficient quality and thats with help from experts. Before that they have 4 years of advanced physics. I'm not saying the common man cant do it, just it'll take them years of hard work to understand it to analysis physics results which have already had 800 physicists pour over it to extract most things of value. However as I said, its really easy to think you've understood the data in a week or so and produce bogus results which I suspect most people would do.

    As for the few geniuses who can handle the data better than any of us, yes its a noble idea and it sounds nice in practice. However these geniuses are still going to have to slog through the data and its still going to be hard, even for them to do it by them selfs. Its not something some wiz kid will pick up and by the afternoon have a nobel prize. However if they are really interested, they can stop by their local particle physics lab and talk to the people there. Its not as if we dont ever give out our data, lots of students (undergraduates and 6th formers (high schoolers for yanks) over the years have been given a copy and helped to understand it. If you want it badly enough you'll probably get some sort of access to old data. Sure some may fall through the cracks but thats unavoidable.

    Also incidentally the most bogus results I'm afraid of are not from the general public but from our theoretical colleagues who are actually the people we are most concerned about hiding the data from :) A lot (but not all) think that data analysis is easy and have a vested interest in proving a certain model so subconsciously they might misinterpret the data or not rigorously check it when it looks like its proving what they want it to prove. Then all of a sudden you have headlines like Prof. X from Ivy League University Y has found a new physics Z in Tevatron/LHC data which if true would be the most significant discovery in physics in the last 30 years and so is splashed all over the media. The public and media just knows this guy is an ivy league professor but doesnt know that he is little more qualified to analysis the results than they are so they believe him. Arguments would then ensure of the significance of the finding and then eventually a retraction is printed. But this would be in the public and in the media and I think this is damaging to science as the general public starts thinking "these stupid scientists, always changing their mind, should we believe anything they say". Plus you would get an increase in the usual crazy science results but this time with data whose analysis most people cant tell is rubbish. Slashdot would be happy as they tend to like crappy science :) but its not something scientists would be happy with.
  • by cortex ( 168860 ) <neuraleng@gmail.com> on Sunday January 20, 2008 @12:33PM (#22117276)

    Nothing! On the other hand, it would be a pretty foolish person who tried to do that -- if you made the data you're likely the only one who truly understands it. Other threads in this discussion talk about that problem in the context of elementary particles. For solar observations it is similar -- there are plenty of "gotchas" in every data set, and you'd better be working with the instrument team if you want to make a fool of yourself.

    This is exactly why this system is likely to fail. No scientist is going to spend millions of dollars and years of effort just to put their data on a server where someone else can analyze it, publish the results, and therefore get most of the credit and reward. The end result of this process is the person actually collecting the data doesn't get tenure and ends up shutting down their lab.

    In terms of understanding the data and "gotchas", we alway have meta-data to explain the details of the experiment and the data. Through collaborations with specific individuals in which publications authorship is discussed up front, I have allowed other to analyzed my data.

    We design and build our instrumentation ourselves, or have in built at an outside contractor. In either case we always validate every piece of experimental equipment. So I think it is safe to say that we are cognizant of the subtleties of our data.

    Another angle: if you really do deserve tenure, then your problem is probably the opposite: you've got too many interesting ideas to explore and data sets to analyze, and you're likely never to get around to doing some of the necessary-but-more-tedious analyses of your back data. If you hold on to the data, it will never get analyzed by anyone.

    Its not a case of deserving tenure or not. You need to have peer-reviewed documentation of scientific productivity and standing. This is why I have graduate students and postdocs. Typically, a senior graduate student or a postdoc ends up being first author on a paper, while I am last author. And this is what tenure review committees look at - How may first and last author papers do you have. Having a lot of papers with my students/postdocs as first author demonstrates that I am being a good mentor and advancing the careers of my the people in my lab. Having a lot of last author publications demonstrates that my lab is in general being productive. They also factor in the quality and prestige of the journals where the work is published.

    As I stated earlier, after my lab has gotten a few publications out of a data set, I would be OK with publishing in an open database. However, I would still insist on having some control over how future publications are credited.

Kleeneness is next to Godelness.

Working...