Forgot your password?
typodupeerror
Software Twitter Technology

Library of Congress Offers Update On Huge Twitter Archive Project 88

Posted by samzenpus
from the 140-little-problems dept.
Nerval's Lobster writes "Back in April 2010, the Library of Congress agreed to archive four years' worth of public Tweets. Even by the standards of the nation's most famous research library, the goal was an ambitious one. The librarians needed to build a sustainable system for receiving and preserving an enormous number of Tweets, then organize that dataset by date. At the time, Twitter also agreed to provide future public Tweets to the Library under the same terms, meaning any system would need the ability to scale up to epic size. The resulting archive is around 300 TB in size. But there's still a huge challenge: the Library needs to make that huge dataset accessible to researchers in a way they can actually use. Right now, even a single query of the 2006-2010 archive takes as many as 24 hours to execute, which limits researchers' ability to do work in a timely way."
This discussion has been archived. No new comments can be posted.

Library of Congress Offers Update On Huge Twitter Archive Project

Comments Filter:
  • Why? (Score:5, Insightful)

    by Anonymous Coward on Monday January 07, 2013 @06:40PM (#42511363)

    Why does the federal government need to archive the useless information twitter calls tweets .. yet another huge wast of my money (being a taxpayer and all)

  • Re:Why? (Score:5, Insightful)

    by griffjon (14945) <GriffJon@gmaLISPil.com minus language> on Monday January 07, 2013 @06:54PM (#42511593) Homepage Journal

    To paraphrase a quote by the Internet Archive chairman from some years back, "The average lifespan of a Web page today is 100 days. This is no way to run a culture."

  • Re:Why? (Score:5, Insightful)

    by Anonymous Coward on Monday January 07, 2013 @07:10PM (#42511759)

    To paraphrase a quote by the Internet Archive chairman from some years back, "The average lifespan of a Web page today is 100 days. This is no way to run a culture."

    The average life of an inane conversation used to be maybe 15 minutes. I'm not sure the world is a better place for having extended that.

  • seriously? (Score:2, Insightful)

    by Anonymous Coward on Monday January 07, 2013 @07:28PM (#42511967)

    300TB worth of tweets, which are basically very small text files? A single tweet, that uses all available character should only be 140 bytes. I just refuse to believe that there is 2+ trillions tweets out there, to make up 280+TB. Considering 1 billion tweets would be 140GB. (unless I'm failing massively at math here, which is quite possible.)

  • Re:Why? (Score:5, Insightful)

    by Hatta (162192) on Monday January 07, 2013 @07:49PM (#42512219) Journal

    Because Twitter is a great model for the spread of ideas. If you study the spread of ideas, you can begin to understand it and use that understanding to affect it. That has enormous value.

If money can't buy happiness, I guess you'll just have to rent it.

Working...