Bringing the Library of Congress Newspapers Online 240

Posted by CmdrTaco on Thursday November 18, 2004 @04:30PM from the thats-a-whole-lotta-bits-and-bites dept.

smooth wombat writes "If you want to read a newspaper article from sometime in the past (say 1920 for example) your only options right now are to go to your local library and hope they have a microfiche file of that paper or take a visit to Washington, DC and the Library of Congress. That may soon change. CNN is reporting that by 2006 the government will have the first of 30 million digitized pages from papers published from 1836 through 1922 which will be available to anyone who has a connection to the net. The project is a joint cooperation between the National Endowment for the Humanities and the Library of Congress. The span of the joint project is limited because type faces of printers used before 1836 are too difficult for optical scanners to read, and copyright restrictions are in force on papers published after 1923."

This discussion has been archived. No new comments can be posted.

Bringing the Library of Congress Newspapers Online

Load All Comments

Search 240 Comments Log In/Create an Account

Comments Filter:

Copyright limits (Score:5, Insightful)

by hey ( 83763 ) writes: on Thursday November 18, 2004 @04:31PM (#10858097) Journal

Yet another good reason for copyrights to expire after a reasonable number of years.

Share
twitter facebook
- Re:Copyright limits (Score:5, Interesting)
  
  by sapped ( 208174 ) writes: <mlangenhovenNO@SPAMyahoo.com> on Thursday November 18, 2004 @04:42PM (#10858227)
  
  I was wondering if anybody could justify news being under copyright for that long. What is there for a newspaper to gain by holding such long copyrights?
  
  Parent Share
  twitter facebook
  - Re:Copyright limits (Score:2)
    
    by CreatureComfort ( 741652 ) * writes:
    
    Paid access to its historical archives?
    
    Advertising money for the ads you would see while browsing such archives?
  - Re:Copyright limits (Score:3, Informative)
    
    by iabervon ( 1971 ) writes:
    
    Owning historical documents must be at least potentially lucretive. The public record has some information, but there's a lot of explanation and commentary that only news articles have. Of course, people tend not to cite sources more extensively than in covered by fair use, and they can go to the LoC to look things up, so they don't really have to buy back-issues (assuming that they even could for most 1923 newspapers at this point), but it's still a possibility.
    
    The New York times has free registration (an
  - Re:Copyright limits (Score:3, Insightful)
    
    by jc42 ( 318812 ) writes:
    
    This might be a good place to bring up the old suggestion that anything out of print for a year become public domain. A newspaper publisher could then maintain their copyright by setting up a method of reprinting old issues. But most of them wouldn't find this lucrative enough, and would just let the copyright expire. Then the LoC could include most newspapers after a year.
    
    One of the very real problems with copyright law is that it allows publishers to "capture" our history and prevent access to some of
    - Re:Copyright limits (Score:3, Insightful)
      
      by Selanit ( 192811 ) writes:
      
      This might be a good place to bring up the old suggestion that anything out of print for a year become public domain. A newspaper publisher could then maintain their copyright by setting up a method of reprinting old issues. But most of them wouldn't find this lucrative enough, and would just let the copyright expire. Then the LoC could include most newspapers after a year.
      
      While I approve the impulse, I think this would be a nightmare to maintain, particularly if the "expire after a year" idea was appli
      - Re:Copyright limits (Score:3, Informative)
        
        by Yartrebo ( 690383 ) writes:
        
        Plagarism is taking credit for others' work. Copyright actually encourages plagarism, as the odds of being caught are much lower if you plagarise.
        
        Plagarism can occur with or without there being copyright, and with or without permission from the author. If copyright determined plagarism, students who copied papers would be all fine and kosher because they had permission to copy the paper from the copyright holder.
        
        Also, plagarism is legal with regards to the copyright code and people who hire ghostwriters d
This is a great idea... (Score:2, Informative)

by Anonymous Coward writes:

I'm surprised they haven't done this sooner... But supposedly, MIT is working on a thing to scan in every document ever in the LOC, for internet access. A monumental task.
This sucks (Score:5, Funny)

by Anonymous Coward writes: on Thursday November 18, 2004 @04:32PM (#10858107)

If the Library of Congress is entirely digitized, that's going to totally screw up the "burning Libraries of Congress" measurement of energy output.

Share
twitter facebook
- Re:This sucks (Score:5, Funny)
  
  by quamaretto ( 666270 ) writes: on Thursday November 18, 2004 @04:39PM (#10858172) Homepage
  
  How about "burning Libraries of Congress to CD"?
  
  Parent Share
  twitter facebook
- Re:This sucks (Score:2)
  
  by micromoog ( 206608 ) writes:
  
  It should allow us to refine the precision of "Libraries of Congress" as a unit of digital storage measure, though (and by corollary, "LOC's per fortnight" as a bandwidth measure).
- Re:This sucks (Score:2)
  
  by burns210 ( 572621 ) writes:
  
  But it makes "Library of Congresses per second" measurement extremely accurate.
  
  Besides, who said a harddrive array can't burn? It just take more work than paper.
Copyright restrictions (Score:2, Interesting)

by stratjakt ( 596332 ) writes:

What is the law regarding an online library? I guess not even the government can do it.

The local library has every edition of the local papers on microfilm, and I suppose they could put it all on DVD too.. When does it become a copyright issue?
- Re:Copyright restrictions (Score:3, Interesting)
  
  by bsartist ( 550317 ) writes:
  
  What is the law regarding an online library?
  
  The same as the law regarding any other form of duplication and distribution. Why would it be any different just because it's online?
  
  The local library has every edition of the local papers on microfilm, and I suppose they could put it all on DVD too.. When does it become a copyright issue?
  
  Assuming the microfilm was legally purchased, they're entitled to show it to as many people as they'd like. It doesn't become a copyright issue until they start making
  - Re:Copyright restrictions (Score:5, Insightful)
    
    by DunbarTheInept ( 764 ) writes: on Thursday November 18, 2004 @05:06PM (#10858518) Homepage
    
    Why would it be any different just because it's online?
    
    In the online world, it is completely impossible to show somebody something without similtaneously giving them a copy of that same something. If the library shows you a html version of the copyrighted work, then it had to do so by sending you the contents of that work as a second digital copy, independant of the copy that's on their hard drive. If the library shows you a GIF image of the copyrighted work, then it hd to do so by sending you the contents of that work. No matter what scheme is used, no matter what technique for encryption is used, the fact of the matter is that at some point, even if just temporarily, your computer has to have its own copy in one way or another.
    
    On the other hand, if I show you a physical book, this doesn't cause two seperate copies of the book to appear.
    
    Unless the online library is willing to delete their copy (even from backups and from the hard drive) while you have your copy (and then trust you to send it back to them when you are done or pay them for it if you lose it), then there cannot be a working analogy between online and physical libraries as far as copyright law goes. Even someone not intending to make use of their copy is still technically breaking copyright law every time they look at a copyrighted work. Your browser's cache is filled with copyright violations if you've ever visited any website with any copyrighted content recently (which is most people who surf the web, probably).
    
    The problem is that the original law was not written with this technology in mind, and the attempts to update it are written by people who just don't understand what they're doing, don't understand how the technology works, and aren't listening to those who do, and instead are listening to those with a vested interest in lying to them about the issue. Hence we get laws that if interpreted literally would outlaw the entire world wide web, but then get enforced selectively. (ALWAYS a bad situation to be in, where it is nearly impossible to avoid violating a law - then the law becomes a means to randomly smack-down on people for whatever you wish to discriminate against them for.)
    
    Parent Share
    twitter facebook
    - Re:Copyright restrictions (Score:3, Interesting)
      
      by geoffspear ( 692508 ) writes:
      
      Your browser's cache is filled with copyright violations if you've ever visited any website with any copyrighted content recently
      Umm, no. If they site you were browsing had the right to distribute the materials, they're not violations. If the site's TOS doesn't allow caching, and they make use of HTTP headers that are supposed to forbid caching, and you knowingly modified your browser to ignore those headers, it might possibly be a violation. Even then it would most likely fall under Fair Use. I doubt
      - Re:Copyright restrictions (Score:3)
        
        by Sai Babu ( 827212 ) writes:
        
        I'm inclined to agree with geoffspear on the caching angle.
        
        Fair use seems pretty permissive in practice. Law being statute plus interpretation plus enforcement when applied to copyright allows much more than a conservative interpretation of the statue would suggest. Especially considering that there has never been a 'photocopier at the library' war.
        There are practical matters too. We don't see people trading books over the P2P nets. It's a PITA to read a book on line and the cost of printing a copy while
    - Re:Copyright restrictions (Score:3, Interesting)
      
      by Qzukk ( 229616 ) writes:
      
      The problem is that the original law was not written with this technology in mind
      
      Therin lies the problem: Copyright law starts by making every living being a criminal, then has poorly defined grey areas of vague exemptions like "fair use" that more often than not have been defined through court cases that cost people money and livelihoods. It wasn't made with any technology in mind, the authors were lawyers who realized they could make money by making sure that every new advance and situation would requ
    - Re:Copyright restrictions (Score:3, Insightful)
      
      by odin53 ( 207172 ) writes:
      
      The problem is that the original law was not written with this technology in mind, and the attempts to update it are written by people who just don't understand what they're doing....
      
      Just because you're not aware of the legal history of copyright law doesn't mean the issues you raise haven't been considered.
      
      We can analogize, for example, to the issue you mention above with copyright law-making from almost 30 years ago. It's been long realized that using a computer program almost always requires making a
    - Re:Copyright restrictions (Score:3, Informative)
      
      by networkBoy ( 774728 ) writes:
      
      In the online world, it is completely impossible to show somebody something without similtaneously giving them a copy of that same something.
      
      No, it's not. These guys (http://www.authentica.com/ [authentica.com]) have done quite a good job of document control and management. I can show you whatever I want and you can't see it once you're done and I revoke access. (requires a plug in for acrobat to use).
      
      We use this system to control restricted access and above documents at my office. Not even a screen capture works!
      
      -
- - Re:Copyright restrictions (Score:2)
    
    by BobPaul ( 710574 ) * writes:
    
    I think you missed the opint of his question...
    
    And I quote The local library has every edition of the local papers on microfilm, and I suppose they could put it all on DVD too.. When does it become a copyright issue?
    
    which really is quite interesting. Why is it not a copyright issue for the local library to do it, while it is for the national library?
    - Re:Copyright restrictions (Score:2)
      
      by bsartist ( 550317 ) writes:
      
      Why is it not a copyright issue for the local library to do it
      
      The LOC is making and distributing copies of the articles. The local library is not. What's so hard to understand?
      - Re:Copyright restrictions (Score:2)
        
        by BobPaul ( 710574 ) * writes:
        
        My local library has an extensive selection of Newspaper prints, magazine periodicals, etc in searchable online formats that can be accessed from home. The only limitation is that you can only view 2 at a time, you can't view them if too many other people are also looking at it, and you can't copy/paste the articles.
        
        Why couldn't the LoC just do this with things after 1923?
        
        Re:Copyright restrictions (Score:2)
        
        by dvdeug ( 5033 ) writes:
        
        Why couldn't the LoC just do this with things after 1923?
        
        Because your local library is paying thousands of dollars to the copyright holders to do that.
Fees (Score:2, Interesting)

by Anonymous Coward writes:

Many newspapers charge obscene fees to access articles more than a week old, yet provide free of charge to library patrons access to their entire archive electronically.
Typeface ? (Score:4, Interesting)

by EpsCylonB ( 307640 ) writes: <eps@epscyl[ ].com ['onb' in gap]> on Thursday November 18, 2004 @04:35PM (#10858136) Homepage

The span of the joint project is limited because type faces of printers used before 1836 are too difficult for optical scanners to read

Surely the OCR process could be recalibrated to identify a different typeface ?

Share
twitter facebook
- Re:Typeface ? (Score:5, Funny)
  
  by PhilipOfOregon ( 771069 ) writes: on Thursday November 18, 2004 @04:41PM (#10858205)
  
  Yef, we could recalibrate the OCR for the early fontf, but the text ftartf to look ftrange.
  "Purfuit of Happineff"
  
  Parent Share
  twitter facebook
  - Re:Typeface ? (Score:4, Insightful)
    
    by dvdeug ( 5033 ) writes: <dvdeug@[ ]il.ro ['ema' in gap]> on Thursday November 18, 2004 @05:43PM (#10858993)
    
    Yef, we could recalibrate the OCR for the early fontf, but the text ftartf to look ftrange.
    
    That's not hard. It would be easy to get the OCR to recognize the long-s (which does in fact look different from the f); even if you don't, post-processing (dictionary lookups to see if f or s is valid at a point) can clear up many cases, and for those it doesn't, well, you're going to have to check and fix the OCR anyway.
    
    (This is not theory; Distributed Proofreaders (http://www.pgdp.net/ [pgdp.net]) has and uses such a post-processor.
    
    Parent Share
    twitter facebook
- And even so... (Score:3, Insightful)
  
  by BobPaul ( 710574 ) * writes:
  
  And if not, couldn't they still post a picturized version? Even if it's essentially digitized microfilm, there's still a lot more you can do with a digital copy than with a microfilm (such as save where you left off, bookmark, backup in case of fire etc.)
  
  I don't understand why the text HAS to be selectable... That's cooler, but it shouldn't need to be a requirement.
  - Re:And even so... (Score:3, Insightful)
    
    by UWC ( 664779 ) writes:
    
    I'd imagine the text of a newspaper would take up much less storage space and bandwidth than would a picture of the newspaper. Plus the ability to be searched.
    - - Re:And even so... (Score:2)
        
        by UWC ( 664779 ) writes:
        
        Fair enough, though even then there will be significant costs involved with making the scans available, including the obvious bandwidth, servers, backup storage, etc. That said, though, if it's not in their plans now, I do hope it will be sometime in the near future.
- Re:Typeface ? (Score:3, Funny)
  
  by chaffed ( 672859 ) writes:
  
  American English has come a long way since 1836. The When attempting to scan older material, the OCR was probably rendering text that read akin to l33t.
  - Re:Typeface ? (Score:2)
    
    by ViolentGreen ( 704134 ) writes:
    
    American English has come a long way since 1836
    
    Perhaps but I don't think the actual letters have. I suspect that the document quality is too poor to make out the characters. Smudges, tears and faded characters probably have more to do with it than the language.
    - Re:Typeface ? (Score:2)
      
      by chaffed ( 672859 ) writes:
      
      I was more concerned with the lack of standardization of the language. Those were wild times. People placing punctuation at will and capitalizing letters willy nilly. How did we ever survive ;)
  - Re:Typeface ? (Score:2)
    
    by dvdeug ( 5033 ) writes:
    
    American English has come a long way since 1836. The When attempting to scan older material, the OCR was probably rendering text that read akin to l33t.
    
    Not really.
    
    "But what Plutarch can this age produce, to immortalize a life so noble? May some excellent historian at length be found, some writer not unworthy of his subject; but may his employment be
    long deferred!"
    
    It's not American, but that's from a book published in 1808 in Britain, edited from text written in the 17th century. The grammar may be a lit
- Re:Typeface ? (Score:5, Informative)
  
  by DunbarTheInept ( 764 ) writes: on Thursday November 18, 2004 @05:16PM (#10858644) Homepage
  
  The fonts were not as uniformly rendered then as they are today.
  
  1. Even with the same exact font (blocks of type) being used, one letter 'A' and the next letter 'A' could look different enough to confuse an OCR program, due to blotchy ink or blotchy paper, like so:
  
  XX XX X X XX X XXXXXX XXX XX X X X X X X X XX
  
  2. Also, the spacing between letters was not as uniform, which would con fuse an OCR pro gram into B reaking words at in con vein ientplaces.
  
  3. And, as the other pofter mentioned, theref the ditterent ftyle ot fymbolf they ufed to ufe.
  
  Parent Share
  twitter facebook
See... (Score:3, Interesting)

by Blue-Footed Boobie ( 799209 ) writes: on Thursday November 18, 2004 @04:37PM (#10858157)

This is the type of things I like my government doing.
Now, where is the open source OCR software that they can use to read the old wonky typefaces?

Share
twitter facebook
Newspaperarchive.com (Score:4, Interesting)

by skenfrith ( 173060 ) writes: <skenfrith&yahoo,com> on Thursday November 18, 2004 @04:39PM (#10858186)

We already have 20 million.

Share
twitter facebook
- Re:Newspaperarchive.com (Score:2)
  
  by sapped ( 208174 ) writes:
  
  I used your birthday newspaper [newspaperarchive.com] to look up my birthday in 1967 and then looked up my children for 1997 and 2002. For my birthdate they must have had 40 or more papers listed. For my children only 3.
  
  As a side note. I have been collecting a newspaper every year on my children's birthdays and will put this together for them in a scrapbook when they are about 20 so that they can see what was relevant in the areas they were living in at the time.
- Re:Newspaperarchive.com (Score:3, Informative)
  
  by CreatureComfort ( 741652 ) * writes:
  
  7) Q: How much is a membership?
  
  A: Currently our monthly membership is $17.95 and our yearly membership is $99.95. The yearly membership provides a savings of $115.45 over the monthly rate.
  
  Yes, but you charge for it. This will be free. If I were you, I'd start looking for a new business model... or start donating to Disney's lobbying fund.
- Re:Newspaperarchive.com (Score:2)
  
  by CerebusUS ( 21051 ) writes:
  
  heh, and here's the standard OCR problem. I looked up my birthday and here was the exceprt:
  
  AT THE HOSPITAL Mlltlrad 115 Clav Mrs Ronald 1 Mrs Cora 1020 and Tom Bradli'v Hamilton ucra admlltrd yo'lerdav lo the Chilli colhe hospital AT WASSMANN RESIDENCE Mr pnd frccl of this n'y iveie iliuner yupsls lasl evening of Mr intl Mrs R O of Ulica lliE occasion
  
  There is a very nice picture I could download if I wanted to, though.
- Re:Newspaperarchive.com (Score:2)
  
  by Em Adespoton ( 792954 ) writes:
  
  And if you go there, you see that the OCR accuracy is even questionable on articles as recent as 1996 -- but they do have the Edinburgh Intelligencer from 1776 :)
You know it's coming (Score:5, Funny)

by daeley ( 126313 ) * writes: on Thursday November 18, 2004 @04:41PM (#10858204) Homepage

(From the digitized 1844 paper...)

Howdy, pardner! To read about that scalliwag Black Bart's shootout with Arizona Jack last week, you'll need to pay two bits per article or buy a subscription for a gold dollar or its equivalent in salt pork or live chickens.

Share
twitter facebook
Not entirely accurate (Score:3, Informative)

by aengblom ( 123492 ) writes: on Thursday November 18, 2004 @04:41PM (#10858207) Homepage

This is not entirely accurate. The Washington Post's archive is available from 1877 to present day if you're willing got pay.

From 1877-1986, the Post offers the full page scans of the articles as they appeared in the newspaper. Begining in 1987, the full text versions of articles (without photos) are available.

Share
twitter facebook
Oxford University tried something similar (Score:5, Informative)

by Derling Whirvish ( 636322 ) writes: on Thursday November 18, 2004 @04:41PM (#10858213) Journal

Oxford University did a trial project [ox.ac.uk] to see how difficult it would be to place some 18th and 19th Century journals online. Here is the final report [ox.ac.uk] giving some of the difficulties they had. The journals are available here [ox.ac.uk] and make for some very interesting browsing.

Share
twitter facebook
Half the fun of old papers is... (Score:5, Insightful)

by MarkEst1973 ( 769601 ) writes: on Thursday November 18, 2004 @04:41PM (#10858216)

seeing the old typesets, how they laid the papers out, the ancient advertisements.
These, to me, were always half the fun whenever I perused old microfiche in the library.
There is a bar in NYC called McSorley's, which has been in continuous existance since 1846 or so. They have framed newspaper articles on the wall from over a hundred years ago, 130 year old pictures, political campaign buttons from McKinley's run. Talk about a neat experience.
Actually seeing the old print would mean more to me. I rather hope that they serve images of the old papers, not just the computer-read text. But hey, that's just me.

Share
twitter facebook
- Re:Half the fun of old papers is... (Score:2)
  
  by Realistic_Dragon ( 655151 ) writes:
  
  You get a lot more LoCs per gigabyte as plain text, and it's a lot more useful if you can search it.
  
  Perhaps later on an image archive would be useful, but untill they can get several terrabytes of bandwidth for free and image->text (on the fly) systems are perfected text is probably a better idea.
  
  Grep! The only way to search 100 years of data for a misspelled word so you can poke fun at the foolish writer.
- - Re:Half the fun of old papers is... (Score:2)
    
    by ConceptJunkie ( 24823 ) writes:
    
    That's OK, I worked for a financial firm on an app that generated an Excel spreadsheet with the output from the program, which at one point was being sent to Bangalore, printed out and keyed back in. Of course, maybe they printed it out and then shipped it... I wouldn't be surprised.
    
    This was for real.
The newspapers need to step up (Score:3, Insightful)

by mogrify ( 828588 ) writes: on Thursday November 18, 2004 @04:42PM (#10858226) Homepage

Newspapers need to waive their copyright restrictions for this particular project. They have a right to control their content, but copyright should not be an impediment to archiving this information. Maybe there's a way to apply the copyright to the end user (i.e. whoever is viewing this content online) without completely excluding the stuff from being indexed. An 80-year blind spot practically ensures irrelevance.

Share
twitter facebook
- Re:The newspapers need to step up (Score:2)
  
  by drinkypoo ( 153816 ) writes:
  
  The newer material will probably be archived in digital form, but simply not made available to the public until the copyright expires. This is the way it should be, but with a much shorter copyright term. I would also argue that news should have a shorter copyright than art, but maybe that's just me.
- Re:The newspapers need to step up (Score:2)
  
  by dvdeug ( 5033 ) writes:
  
  An 80-year blind spot practically ensures irrelevance.
  
  Only to someone who's forgetting history. Some of the issues will be surprisingly relevant; some will only matter to those who study history. It doesn't mean that once something is a hundred years old, it doesn't matter; the roots of current events are in matters more then a century old, often many centuries old.
copyright insanity (Score:5, Insightful)

by drDugan ( 219551 ) writes: on Thursday November 18, 2004 @04:45PM (#10858268) Homepage

and copyright restrictions are in force on papers published after 1923

in case anyone was still left who thought copyright laws were reasonable....

Share
twitter facebook
- Re:copyright insanity (Score:3, Insightful)
  
  by rewt66 ( 738525 ) writes:
  
  Um, why exactly is this moderated "troll"? I'd call it "insiteful", myself, but I've burned all my moderation points for the day...
  
  The point is that this is a perfect illustration of why the current copyright length is insane. It's something you can use to explain it to your neighbors, and they might get it. It's even something you might be able to use to explain to your legislator in terms they can understand ("hey, look, long copyrights even get in the way of this perfectly reasonable government proje
This is old news (Score:5, Interesting)

by RealProgrammer ( 723725 ) writes: on Thursday November 18, 2004 @04:45PM (#10858270) Homepage Journal

... and for once, it's interesting.

To most Americans, the period from 1790 to 1915 is kind of a mystery except for Gettysburg and the Ford Theater.

There was tremendous growth in the number of newspapers during that period, starting at a handful in 1790 to thousands in the 1920's. They fell on hard times with the advent of radio.

During that time, everyone with a spare nickel and a desire to publish something put out their own rag. They would trade stories, publish letters to each other, have flame wars, etc. I think it must have looked a lot like the blogosphere, with a bit more latency.

The more things change, the more they stay the same. Sometimes, we need to see the old news to recall that.

Share
twitter facebook
- Re:This is old news (Score:2)
  
  by burns210 ( 572621 ) writes:
  
  "During that time, everyone with a spare nickel and a desire to publish something put out their own rag. They would trade stories, publish letters to each other, have flame wars, etc. I think it must have looked a lot like the blogosphere, with a bit more latency."
  
  What an interesting example of history repeating itself. Here we have a 19th century implementation of Usenet. With the LoC(Library of Congress, that is) and the Gutenberg Project(which has a sizeable but not LoC-sized collection already), we wil
- Re:This is old news (Score:2)
  
  by ViolentGreen ( 704134 ) writes:
  
  To most Americans, the period from 1790 to 1915 is kind of a mystery except for Gettysburg and the Ford Theater
  
  Exactly. I include myself in that. I got very little history in high school apart from the early history of our country. I got very little recent history (post civil war.) It's the one part of my high school experience that I consider lacking. I took a History of Western Civ. since 1600 at one of the Universities I attended and while very interesting, it focused mainly on European history.
  - Re:This is old news (Score:2)
    
    by geoffspear ( 692508 ) writes:
    
    Umm... you could always have taken a more specialized American History course instead (or in addition).
Want earlier papers? (Score:2)

by Realistic_Dragon ( 655151 ) writes:

Pay college students $0.03 an hour to type them in. Monkey see monkey do monkey buy coffee with proceeds.

Presumably papers after 1923 will be added one year at a time as the copyright expires? Or will the mouse protection league keep them locked away for ever?*

*On a related note a BBC radio broadcast about a hitch hiking trip had a comment from a Fat Woman in her slightly derranged middle age who was on her way to Disney World in Florida. She said that America would be a much better place if Disney ran it
- Re:Want earlier papers? (Score:4, Interesting)
  
  by Misch ( 158807 ) writes: on Thursday November 18, 2004 @05:12PM (#10858594) Homepage
  
  Presumably papers after 1923 will be added one year at a time as the copyright expires?
  
  The Mickey Mouse Protection Act,(aka Sonny Bono Copyright Term Extension Act [slashdot.org]) tacked on an immideate and retroactive 20 years to copyright length. So, don't look for anything to be entering the public domain until 1/1/2019. And that's not even considering the likelyhood of Congress extending the length of copyrights again.
  
  Parent Share
  twitter facebook
Kind of a bummer... (Score:3, Interesting)

by chrisgeleven ( 514645 ) writes: on Thursday November 18, 2004 @04:48PM (#10858307) Homepage

Too bad they aren't scanning newspapers from say the revolutionary war period. I think it would have been really interesting to read the war and the general thoughts about it at the time.

I'm sure OCR technology will advance quickly enough to allow the scanning of these newspapers.

Share
twitter facebook
SWEET! (Score:2)

by suso ( 153703 ) writes:

This is really cool. IMHO this was a major medium that was lacking a web interface. There were a lot of times when I would search for a piece of information that I new was in a paper, but wasn't archived on the net anywhere.

My first time in the paper: Front page of Times Union on Feburary 19th/20th, 1989.
Please Simplify (Score:2)

by pipingguy ( 566974 ) writes:

How many Volkswagen Beetles would be needed to contain this?
Why not pass it through project Gutenburg? (Score:3, Insightful)

by t0qer ( 230538 ) writes: on Thursday November 18, 2004 @04:49PM (#10858320) Homepage Journal

With all the OCR problems, i'm sure the folks down at Project Gutenburg [promo.net]wouldn't mind taking this on.

Share
twitter facebook
- Re:Why not pass it through project Gutenburg? (Score:2, Interesting)
  
  by rbenech ( 97413 ) writes:
  
  Technically, it's us folks at Distributed Proofreaders [pgdp.net] that do the dirty work of fixing OCR problems.
  I've done over a thousand pages since it's started... It's gotten really easy for me to pump out pages, and I've been turned on to alot of different information that I'd normally not expose myself too... It's quite enriching -- so you should try it if you got time! ;-)
Time Travel (Score:2)

by blueZhift ( 652272 ) writes:

W00T! Time travel in my own lifetime! I think it's going to be a lot of fun reading those old papers and getting a flavor for life during those times. This is also going to be a great resource for historians and geneaologists.
can't scan? (Score:2, Insightful)

by toby ( 759 ) writes:

type faces of printers used before 1836 are too difficult for optical scanners to read

Bollocks. Even if they are trying to OCR this stuff, it's critical that the original page bitmaps remain available, anyway.
I'm amazed they still have these archives. One of my favourite people, Nicholson Baker [wikipedia.org] has made a personal crusade [gwi.net], written books [salon.com] on the subject, and put enormous amounts of his own cash, into preserving newspapers that government archives are hellbent on destroying. In particular he attacks two
- Re:can't scan? (Score:2)
  
  by drinkypoo ( 153816 ) writes:
  
  Data has the capacity to last forever - since it's digital, every successful copy is an exact copy. As long as you refresh your media periodically, data can last forever once it's digitized. Paper will not last forever no matter what you do to it, unless you can find a way to remove all the water from it and then reduce its temperature to absolute zero.
Check out Dec. 7, 1941 some time (Score:2, Interesting)

by Please tell me why ( 829412 ) writes:

My library had the NY Times on microfilm so I decided it would be interesting to look up famous dates. I checked Dec 7 1941 but there was no article on Perl Harbor. Figuring with the time difference and printing times it didn't make it I checked Dec. 8th. Still nothing. Gradually over the next few days the story began to trickle out that "yes, something happened", "a few ships were damaged", "quite a few ships were damaged". It was a week later before the story was consistant with what we now believe
- Re:Check out Dec. 7, 1941 some time (Score:4, Funny)
  
  by CJ Hooknose ( 51258 ) writes: on Thursday November 18, 2004 @05:58PM (#10859171) Homepage
  
  Please tell me why: I checked Dec 7 1941 but there was no article on Perl Harbor Now I can't shake this mental image of Japanese Zeroes dropping extremely large regular expressions on the USS Arizona....
  
  Parent Share
  twitter facebook
Already been done for journals (Score:2, Informative)

by Anonymous Coward writes:

This has already been done for journals by the Making of America Project. So wouldn't the
process be similar for for newspapers. But, newspapers are printed on lower quality paper and
possibly lower quality printing technology.

Making of America (MOA)
http://cdl.library.cornell.edu/moa/ [cornell.edu] (Cornell U)

http://www.hti.umich.edu/m/moagrp/ [umich.edu] (U Michigan)
National Geographic online (Score:2)

by lawpoop ( 604919 ) writes:

I was thinking about digging up old National Geographics, scanning the text and photos, and posting that online. It would make for a great distributed project. However, Nats from before 1923 are rare and expensive. I wonder if I can find them at libraries...
- Re:National Geographic online (Score:2)
  
  by cmpalmer ( 234347 ) writes:
  
  I'm not sure if you were serious or just poking fun at copyright, but in case you were, you evidently haven't seen this:
  
  http://www.nationalgeographic.com/cdrom/
  - Re:National Geographic online (Score:2)
    
    by lawpoop ( 604919 ) writes:
    
    I'm serious, and I'm not just poking fun at copyright. See, mine would be for free, on a website. No need to purchase CDROMS, and it's dynamic. New issues are posted regularly, as they fall into the public domain.
- Re:National Geographic online (Score:2)
  
  by dvdeug ( 5033 ) writes:
  
  I was thinking about digging up old National Geographics, scanning the text and photos, and posting that online. It would make for a great distributed project.
  
  If you're willing to scan them, Distributed Proofreaders (http://www.pgdp.net/ [pgdp.net]) is willing to correct the OCR and even have people assemble them into HTML, provided you're willing to let Project Gutenberg (http://www.gutenberg.org/ [gutenberg.org]) post them.
Goverment and the american history. (Score:2)

by blanks ( 108019 ) writes:

All digitally enhanced and edited to give you a better happier feeling of your government, and America ass seen through the eyes of censorship. After what has happened over the last few years the last thing I want to depend on is the government telling me what has happened in history, or telling me WHAT parts of American history (aka news) I can have access too. Yes this is something they are going to be offering, and their will be other areas where you could get this information, but I can see a lot of p
- Re:Goverment and the american history. (Score:5, Insightful)
  
  by dvdeug ( 5033 ) writes: <dvdeug@[ ]il.ro ['ema' in gap]> on Thursday November 18, 2004 @06:08PM (#10859293)
  
  All digitally enhanced and edited to give you a better happier feeling of your government
  
  The LoC would have their reputation destroyed among the librarian and researcher communities if they were caught doing that; and they would be, because hard-core researchers would notice any significant changes in the text and go back to the microfilm and original text copies.
  
  Librarians tend to be among the strongest anti-censorship groups in America. There's never been any insinuation that the Library of Congress was having its strings pulled by the forces in power. I trust the Library of Congress to be a neutral provider of information much more then, say, the Washington Post or the Encyclopedia Britannica.
  
  I can see a lot of places (libraries primary example) that will no longer carry or supply this type of information, because the government will supply it to us.
  
  Most libraries are part of government. Why should you trust your home-town library more than the Library of Congress?
  
  Parent Share
  twitter facebook
I can't wait to read the old ads (Score:2, Interesting)

by yorkpaddy ( 830859 ) writes:

Ads today are complete rubbish. Even looking back at ads from the 80`s in pcmagazine, they were a lot better then. Back then they would tell you the actual benefits and features of a product. Now you get a picture of the sky, with a window and a question, "where do you want to go today?". I want to know what I'm buying, and I don't think its an artists rendition of utopia, its a computer program.
- Re:I can't wait to read the old ads (Score:2, Interesting)
  
  by DoomedPhil ( 156374 ) writes:
  
  Check out older ads, say from the 20s. You could learn all about the benefits of Dr. Smith's Radium Tablets or the PATENTED Electrification Machine! You know, all the stuff that really works that the FDA is keeping from us.
Disney's fault (Score:5, Interesting)

by eison ( 56778 ) writes: <pkteison.hotmail@com> on Thursday November 18, 2004 @05:17PM (#10858653) Homepage

Mickey Mouse is keeping us from reading newspapers from the great depression? How powerful should one rat be?

Share
twitter facebook
Lawyers and legal researchers (Score:2)

by grolaw ( 670747 ) writes:

will make hay with this archive.

What were the jury members saying after the trial? Who were the witnesses and what was their standing in the community? How did the decedent's estate fare where the bastards claimed that they were not bastards?

Aside from the births and deaths, the property records will be very valuable.

Many of these documents are available in microform, but the actual value of the documents will be increased exponentially where the full text is searchable. At present the vast majority a
Connecticut Offers Something Similar (Score:2)

by Mean_Nishka ( 543399 ) writes:

Connecticut has offered access [iconn.org] to some of the Pro-Quest databases to any resident with a CT library card.
They have archives for the NY Times, Hartford Courant, LA Times, Wall Street journal, Washington Post, etc. While the archives don't go back too far (twenty years for some papers, six for the NY Times) it is nice to see governments offering citizens access to this information free of charge. I use it quite frequently, and with hope they can get funding for the historical New York Times service (whic
you don't have to go to local library or LOC (Score:5, Informative)

by Squeezer ( 132342 ) writes: <awilliam@md a h . s t a t e . ms.us> on Thursday November 18, 2004 @06:31PM (#10859557) Homepage

Each state has an archives + history department (or somethign similar to archive all state history). You can go to your state's archies and history dept and pull just about any state newspaper from any time period that you want. We go from the present (well a couple of weeks before present, it takes us a few days to convert the newspaper to microfilm). our oldest newspaper on microfilm is from 1736.

Yes its not online. we don't have the staff or money to put it online, pesently, but we are trying to put as much of our records online right now.

Anyway, you can check out the one I work for, and if you Live in Mississippi, please come by and check us out. We are open 6 days a week and are totally free.

http://www.mdah.state.ms.us/ [state.ms.us]

Share
twitter facebook
Finally (Score:4, Funny)

by jonnystiph ( 192687 ) writes: on Thursday November 18, 2004 @08:00PM (#10860258) Homepage

An complete resource for all those Call of Cthulhu campains.

Share
twitter facebook
- Re:Google (Score:5, Insightful)
  
  by bsartist ( 550317 ) writes: on Thursday November 18, 2004 @04:38PM (#10858166) Homepage
  
  I'm not so sure about the significance of the content, what did they write/read in 19th Century?
  
  Obituaries and marriage announcements, for one this. This stuff will be a gold mine for genealogists.
  
  Parent Share
  twitter facebook
  - Re:Google (Score:5, Interesting)
    
    by fumblebruschi ( 831320 ) writes: on Thursday November 18, 2004 @04:57PM (#10858412)
    
    It'll be a big help to me personally!
    I work as a research assistant, which involves a great deal of time going through libraries and copying old journal articles (and I get paid, too, can you believe that?)
    Eight or nine months ago I was looking stuff up for my professor's book on the history of the death penalty in the United States, and she had me track down an article from the Hattiesburg (Miss.) American on an outlaw named John Long, who was hanged in Mississippi in 1870. No library in New England archives the Hattiesburg American--not even Harvard or the Athenaeum--so in the end I had to call the Hattiesburg Public Library and ask the librarian to make me a photocopy of that article.
    (We had a hard time understanding each other--I had to spell out the name "John Long" because my Boston accent confused her. I had the same problem in South Carolina when I asked the gas station attendant what town I was in. It was Summerton, which she pronounced something like "Suhhhn't'n"--eventually she had to point to it on a map.)
    Believe me, this project could save me a lot of backache and eyestrain. Looking through six months of the New York Times from 1899 on microfilm because some footnoter wasn't more specific than "late 1899" is no joke.
    
    Parent Share
    twitter facebook
- Re:Google (Score:5, Insightful)
  
  by Scoria ( 264473 ) writes: <slashmail AT initialized DOT org> on Thursday November 18, 2004 @04:39PM (#10858174) Homepage
  
  The span of the joint project is limited because type faces of printers used before 1836 are too difficult for optical scanners to read
  
  That excerpt strongly implies the use of OCR, in which case the search engines probably won't require a substantial amount of time to index the archive.
  
  On a related note, many historically memorable events occurred during the timeframe mentioned. These include the American Civil War, the Titanic disaster, and many others.
  
  Parent Share
  twitter facebook
  - Re:Google (Score:2)
    
    by lukewarmfusion ( 726141 ) writes:
    
    World's Largest Metaphor Sinks!
    
    (courtesy The Onion [theonion.com], in case you haven't seen "Our Dumb Century")
  - - Re:Google (Score:3, Insightful)
      
      by OECD ( 639690 ) writes:
      
      WWI!
      Isn't it amazing that reporting on WWII is still under copyright?
      - Re:Google (Score:3)
        
        by IWannaBeAnAC ( 653701 ) writes:
        
        Yeah, and most likely, it will never become public domain now. It is quite likely a lot of post-1922 newspapers will simply vanish, because even making a copy for private use is an infringement, no way could a museum (for example) do this systematically.
        I believe a lot of old films have already been lost, because tracking the current copyright holder is too expensive or simply cannot be done, but without their permission it is illegal to copy the old & decaying prints onto new media.
- Re:Google (Score:5, Insightful)
  
  by 44BSD ( 701309 ) writes: on Thursday November 18, 2004 @04:40PM (#10858198)
  
  The fact that you have no idea what people wrote or read about shows the importance of making the materials more accessible.
  
  Parent Share
  twitter facebook
- Re:Google (Score:2)
  
  by wankledot ( 712148 ) writes:
  
  "but I'm not so sure about the significance of the content, what did they write/read in 19th Century?"
  I'm temped to mod you funny, but sadly I think you're serious. Obviously nothing important happened between the late 1800s and 1920, we should probably just ignore it all. I guarantee those 30M pages are more significant than half of google's 8B. Unless you think a person's blog with pictures of their cat and a review of the latest Dashboard Confessional album is important.
- Re:Google (Score:4, Insightful)
  
  by c0p0n ( 770852 ) writes: <copong@@@gmail...com> on Thursday November 18, 2004 @04:43PM (#10858245)
  
  ...I'm not so sure about the significance of the content, what did they write/read in 19th Century?...
  
  What they named news at their time is what we call history right now.
  
  Parent Share
  twitter facebook
  - Re:Google (Score:2)
    
    by TheGavster ( 774657 ) writes:
    
    Whereas what we term 'news' right now was what they called 'court theatre' in the 19th century ...
- Re:Google (Score:2)
  
  by Realistic_Dragon ( 655151 ) writes:
  
  what did they write/read in 19th Century?
  
  Words. They had moved on from heiroglyphics and runes by then.
- What Did They Write About In the 19th Century? (Score:3, Informative)
  
  by reallocate ( 142797 ) writes:
  
  >>"...I'm not so sure about the significance of the content, what did they write/read in 19th Century?
  "
  
  Presumably, everything you missed by not taking history.
  
  In that timespan, the U.S. expanded to the Pacific; fought wars with Mexico and Spain; participated in World War One; prompted the formation of the League of Nations; built the world's largest railway network; invented the telegraph, telephone, electric light, and the airplane; developed mass production and the auto industry; produced inumera
- - Re:Google (Score:5, Insightful)
    
    by k98sven ( 324383 ) writes: on Thursday November 18, 2004 @04:57PM (#10858408) Journal
    
    Actually, (having done a little historical research myself) those kinds of things are relatively easy to find. (church and public records)
    
    In general, the most interesting stuff is often the stuff which was the least interesting when the newspaper was published, such as advertisments, expressions and figures-of-speech in the articles, opinion pieces, the style of reporting, the biases.
    
    All these little things that generally convey the atmosphere and mindset of an age. It's easy to find out facts, like the construction date of a factory. It's more difficult to find out what people were thinking about the new factory.
    
    Parent Share
    twitter facebook
- Re:Thank you (C)opyrights law (Score:3, Funny)
  
  by rainman_bc ( 735332 ) writes:
  
  In 7 years we'll be able to read about black Monday.
  Not if someone patents the act of reading historical articles about black Monday!
- Re:Thank you (C)opyrights law (Score:4, Informative)
  
  by dvdeug ( 5033 ) writes: <dvdeug@[ ]il.ro ['ema' in gap]> on Thursday November 18, 2004 @05:55PM (#10859134)
  
  In 7 years we'll be able to read about black Monday.
  
  Nope; everything from 1909 to 1922 is only in the public domain because it was grandfathered in in the Sonny Bono Copyright Extension Act. Newspapers that were published in 1929 will be in the public domain in until 1929+95 years. So in 2025 you'll be able to read about Black Monday.
  
  Parent Share
  twitter facebook
- - - Re:Scan it (Score:3, Informative)
      
      by dvdeug ( 5033 ) writes:
      
      they put effort into changing the format from paper to Jpeg or whatever.
      
      Feist [cornell.edu] says that just effort doesn't a copyright make; it requires creative input.
      
      Project guttenberg has their small print because of editing
      
      Reread the small print. It's not a copyright license, it's a trademark license. If you remove the Project Gutenberg trademark from the etext, you can do whatever you want with it. (Assuming it's not one of the rare ones that's still under copyright, but the author gave the right post it.)
- Re:some old newspapers are available online (Score:2)
  
  by CJ Hooknose ( 51258 ) writes:
  
  rosy1280: Proquest (a database vendor) has something called historical New York Times, Washington Post, and a few other historical newspapers.
  Also the Christian Science Monitor and the Wall Street Journal. Boston Globe, Chicago Tribune, and Atlanta Journal-Constitution in a few months assuming they can get off their butts. All those materials are scanned from microfilm, split up, OCRed, put through some stuff I can't talk about, then shoveled into Proquest's database. You can search for words in a dat

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Copyright limits (Score:5, Insightful)

Re:Copyright limits (Score:5, Interesting)

Re:Copyright limits (Score:2)

Re:Copyright limits (Score:3, Informative)

Re:Copyright limits (Score:3, Insightful)

Re:Copyright limits (Score:3, Insightful)

Re:Copyright limits (Score:3, Informative)

This is a great idea... (Score:2, Informative)

This sucks (Score:5, Funny)

Re:This sucks (Score:5, Funny)

Re:This sucks (Score:2)

Re:This sucks (Score:2)

Copyright restrictions (Score:2, Interesting)

Re:Copyright restrictions (Score:3, Interesting)

Re:Copyright restrictions (Score:5, Insightful)

Re:Copyright restrictions (Score:3, Interesting)

Re:Copyright restrictions (Score:3)

Re:Copyright restrictions (Score:3, Interesting)

Re:Copyright restrictions (Score:3, Insightful)

Re:Copyright restrictions (Score:3, Informative)

Re:Copyright restrictions (Score:2)

Re:Copyright restrictions (Score:2)

Re:Copyright restrictions (Score:2)

Re:Copyright restrictions (Score:2)

Fees (Score:2, Interesting)

Typeface ? (Score:4, Interesting)

Re:Typeface ? (Score:5, Funny)

Re:Typeface ? (Score:4, Insightful)

And even so... (Score:3, Insightful)

Re:And even so... (Score:3, Insightful)

Re:And even so... (Score:2)

Re:Typeface ? (Score:3, Funny)

Re:Typeface ? (Score:2)

Re:Typeface ? (Score:2)

Re:Typeface ? (Score:2)

Re:Typeface ? (Score:5, Informative)

See... (Score:3, Interesting)

Newspaperarchive.com (Score:4, Interesting)

Re:Newspaperarchive.com (Score:2)

Re:Newspaperarchive.com (Score:3, Informative)

Re:Newspaperarchive.com (Score:2)

Re:Newspaperarchive.com (Score:2)

You know it's coming (Score:5, Funny)

Not entirely accurate (Score:3, Informative)

Oxford University tried something similar (Score:5, Informative)

Half the fun of old papers is... (Score:5, Insightful)

Re:Half the fun of old papers is... (Score:2)

Re:Half the fun of old papers is... (Score:2)

The newspapers need to step up (Score:3, Insightful)

Re:The newspapers need to step up (Score:2)

Re:The newspapers need to step up (Score:2)

copyright insanity (Score:5, Insightful)

Re:copyright insanity (Score:3, Insightful)

This is old news (Score:5, Interesting)

Re:This is old news (Score:2)

Re:This is old news (Score:2)

Re:This is old news (Score:2)

Want earlier papers? (Score:2)

Re:Want earlier papers? (Score:4, Interesting)

Kind of a bummer... (Score:3, Interesting)

SWEET! (Score:2)

Please Simplify (Score:2)

Why not pass it through project Gutenburg? (Score:3, Insightful)

Re:Why not pass it through project Gutenburg? (Score:2, Interesting)

Time Travel (Score:2)

can't scan? (Score:2, Insightful)

Re:can't scan? (Score:2)

Check out Dec. 7, 1941 some time (Score:2, Interesting)

Re:Check out Dec. 7, 1941 some time (Score:4, Funny)

Already been done for journals (Score:2, Informative)

National Geographic online (Score:2)

Re:National Geographic online (Score:2)

Re:National Geographic online (Score:2)

Re:National Geographic online (Score:2)

Goverment and the american history. (Score:2)

Re:Goverment and the american history. (Score:5, Insightful)

I can't wait to read the old ads (Score:2, Interesting)

Re:I can't wait to read the old ads (Score:2, Interesting)

Disney's fault (Score:5, Interesting)

Lawyers and legal researchers (Score:2)