Library of Congress Offers Update On Huge Twitter Archive Project

Library of Congress Offers Update On Huge Twitter Archive Project 88

Posted by samzenpus on Monday January 07, 2013 @06:38PM from the 140-little-problems dept.

Nerval's Lobster writes "Back in April 2010, the Library of Congress agreed to archive four years' worth of public Tweets. Even by the standards of the nation's most famous research library, the goal was an ambitious one. The librarians needed to build a sustainable system for receiving and preserving an enormous number of Tweets, then organize that dataset by date. At the time, Twitter also agreed to provide future public Tweets to the Library under the same terms, meaning any system would need the ability to scale up to epic size. The resulting archive is around 300 TB in size. But there's still a huge challenge: the Library needs to make that huge dataset accessible to researchers in a way they can actually use. Right now, even a single query of the 2006-2010 archive takes as many as 24 hours to execute, which limits researchers' ability to do work in a timely way."

Library of Congress Offers Update On Huge Twitter Archive Project

Search 88 Comments Log In/Create an Account

Comments Filter:

Stuck in a loop here.... (Score:4, Funny)

by rts008 ( 812749 ) writes: on Monday January 07, 2013 @07:08PM (#42511753) Journal

So, just how many 'Libraries of Congress' are there in 300TB?
Does this mean that as the archives swell, the metric does also?
Where does this madness end? ;-)

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Library of Congress Offers Update On Huge Twitter Archive Project 88

Library of Congress Offers Update On Huge Twitter Archive Project More Login

Library of Congress Offers Update On Huge Twitter Archive Project

Stuck in a loop here.... (Score:4, Funny)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot