Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

[ Create a new account ]

Google To Digitize Millions of Old Newspaper Pages

Posted by kdawson on Tuesday September 09, @02:37AM
from the all-the-news-that-fits-we-print dept.
hhavensteincw writes "On Monday Google detailed new plans to digitize millions of newspaper pages with articles, photographs, and headlines intact so they can be accessed and searched online. 'Around the globe, we estimate that there are billions of news pages containing every story ever written,' Google said in a blog post. 'It's our goal to help readers find all of them, from the smallest local weekly paper up to the largest national daily.' For example, Google noted the availability of an original article from the Pittsburgh Post-Gazette from 1969 about the landing on the moon." When you search the news archive for, e.g., "Chicago fire" or "Rosenberg trial," a significant fraction of the result pages cost money to view.

Related Stories

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Paydirt! (Score:5, Funny)

    by QuantumG (50515) * <qg@biodome.org> on Tuesday September 09, @02:38AM (#24929815) Homepage Journal

    http://news.google.com/archivesearch?q=%22armadillo+aerospace%22&scoring=t [google.com]

    Fuck I wish Carmack would stop using his Time Machine to get 1957 publicity.

  • Great! (Score:5, Insightful)

    by Anonymous Coward on Tuesday September 09, @02:54AM (#24929869)

    Now, all those guys/girls who streaked during Woodstock are going to repent (more).

    But seriously...

    1. Guy/girl does something goofy in 70s as a teenager.
    2. Gets covered by local news (at that time).
    3. Google digitises that news.
    4. Now CEO (then guy/girl) is suddenly let go.

    Who hasn't done something goofy and thought in retrospect wished they hadn't done it (not necessarily something criminal). Google might make their "second chance" disappear.

    ps. Carly F. might have seen this coming ;-)

    • Re:Great! (Score:5, Insightful)

      by houstonbofh (602064) on Tuesday September 09, @03:02AM (#24929903)

      Who hasn't done something goofy and thought in retrospect wished they hadn't done it (not necessarily something criminal). Google might make their "second chance" disappear.

      Or it might finally make people realize that we are all human, and a stupid act at 18 doesn't equate to judgment post 30. Naaahhh...

    • Re:Great! (Score:5, Funny)

      by n dot l (1099033) on Tuesday September 09, @04:39AM (#24930311)

      Who hasn't done something goofy and thought in retrospect wished they hadn't done it (not necessarily something criminal).

      Those that didn't get caught?

      • by stranger_to_himself (1132241) on Tuesday September 09, @04:05AM (#24930163) Journal

        Guy/girl does something goofy in 70s as a teenager. Gets covered by local news (at that time).

        I've seen that already. I looked up an executive, and Google returned a hit from a student newspaper from the 1960s that they'd digitized from microfilm. The story mentioned the guy being a member of the Socialist Workers Alliance.

        Oh no! Exec dabbled with left wing ideology in youth! By the way I was a member of the Socialist Worker Student Society when I was a student because I was trying to impress a girl. Why would anybody care?

        The people that freak me out are Young Conservatives. Those guys are creepy.

        • by YourExperiment (1081089) on Tuesday September 09, @04:59AM (#24930367)

          Oh no! Exec dabbled with left wing ideology in youth! By the way I was a member of the Socialist Worker Student Society when I was a student because I was trying to impress a girl. Why would anybody care?

          I can see why this would be harmful to his career. As soon as word got out that, at some point in his past, he actually cared about people, his reputation as a business executive would be ruined. He might never get another six-figure salaried job again.

  • At last! (Score:5, Interesting)

    by telchine (719345) on Tuesday September 09, @02:58AM (#24929883)

    I welcome this news. For too long, research on the Internet has been a frustrating task. For any events after about 1997, there's oodles of information. However there's a giant hole in the amount of information available for events before then. Google Books went some way towards addressing this, but it was still an intense task because a lot of the time, you still have to find and buy the books (or find them in a Library).

    I really hope they plan to go as far as putting local, regional newspapers online as well.

      • Re:At last! (Score:5, Informative)

        by stranger_to_himself (1132241) on Tuesday September 09, @04:09AM (#24930185) Journal
        Google Scholar is also date-searchable for obvious reasons. It wouldn't be too hard to implement this for regular Google going forwards, since it would only have to remember when it indexed everything. I vaguely remember when every web page had a 'last-updated' line at the bottom. You don't see that much anymore, maybe because it made people look bad.
  • by G3ckoG33k (647276) on Tuesday September 09, @03:01AM (#24929895)

    At last, something that looks really GOOD, from Google! With free access, this will really change the world, even more.

    History revisionists will find it even more difficult to dupe.

    Maybe there are serious drawbacks, but, for the time I cannot see anything but the positive aspects.

    • Maybe there are serious drawbacks

      There are serious drawbacks, but mostly they aren't actually Google's fault.
       
      The problem is, this kind of preservation costs serious money - so it's only done once from one master. Then that one master is distributed widely.
       
      An anecdote from the early 90's, when moving newspaper archives onto microfiche really got started in a serious way. A friend was doing research for a college thesis, and the microfiche copy at his university of an obscure and long defunct western paper was missing a page (a page of the newspaper had been lost sometime in the past and thus was not in the microfiche copy) - the precise page he needed in fact. So he called around and got photocopies (real photocopies back then) from other universities whose libraries held microfiche copies of that newspaper.
       
        Each and every one of them was missing the same page.
       
      Turns out one library had paid to have their archives copied onto microfiche - and then recouped their costs by selling copies. Each and every library that had held dead tree copies had replaced them with this microfiche and then heaved the hardcopies into the dumpster.
       
      That page is now forever lost to history.

      • by sanjosanjo (804469) on Tuesday September 09, @06:27AM (#24930701)

        Gather enough newspapers from all around the country and pretty much anything you find will be almost as reliable as finding something written by a random blogger on the web.

        I find this comparison a little shaky. Major newspapers have long used professional (paid) journalists who are overseen by professional (paid) editors - both with reputations to protect. I don't see this type of control from a random blogger.

  • I hope they aren't restricting it to just newspapers. I've saved tons of interesting web articles from official news websites that have mysteriously disappeared over the years. They're not even in the Google cache. Hopefully, most of them will be in the Google News archive.
  • Uh-oh! (Score:5, Funny)

    by zmollusc (763634) on Tuesday September 09, @03:13AM (#24929947)

    I hope to god that they edit out the advertising otherwise all us consumers will be frantic with longing for products that are no longer available, what with advertising not being a huge sham and all!

    • Re:Uh-oh! (Score:5, Interesting)

      by SeaFox (739806) on Tuesday September 09, @03:36AM (#24930059)

      Funny enough, I checked out the example just to see the advertising on the paper. We all know enough about the moon landing I really don't need to see a 1969 paper of the info. I wanted to see 1) How big the headline is (you notice that you don't see the old 200+pt size headlines on papers now that we used to see for things like wars ending, man on the moon, ect), and 2) Getting a kick out of the old school graphic design and ads in the paper. I was zoomed in reading the movie listing on the opposite page (I guess the back) from the moon-landing story. I didn't see any prices for admission (something to raise my ire at the current $7 "matinee") but I didn't see any evidence they had removed it either.

  • by plen246 (1195843) on Tuesday September 09, @03:19AM (#24929971)
    My thirty-year, $50-billion plan to consolidate the microfiche market may well be in the shitter.
  • Just buy databases? (Score:5, Interesting)

    by TFer_Atvar (857303) on Tuesday September 09, @03:24AM (#24930003) Homepage
    Why doesn't Google just purchase some of the better newspaper archive databases, such as NewsBank, and simply release all the stories for free? It'd likely be a lot cheaper than duplicating effort, and would help information be released more quickly.

    Incidentally, if you're close to a university or a good library, many of these places already hold subscriptions to such services and offer the use of them for free. I'd love to see Google expand upon this already-good base rather than duplicating effort.
  • by frenchbedroom (936100) on Tuesday September 09, @03:31AM (#24930045)

    You can already access the archives of The Times online :

    http://archive.timesonline.co.uk/ [timesonline.co.uk]

    It's quite interesting to read about Marie-Antoinette's execution or Jack the Ripper's crimes, I especially like the writing style :)

  • Hardly the first... (Score:5, Informative)

    by Catmeat (20653) <mtm AT sys DOT uea DOT ac DOT uk> on Tuesday September 09, @04:12AM (#24930201)
    So... just like the London Gazette [gazettes-online.co.uk] has already been digitized. The difference is, the Gazette began publishing in 1665 [wikipedia.org]. Sod the moon landings! You can read the front-line reports about the American Revolution.
    • by MrMr (219533) on Tuesday September 09, @04:51AM (#24930343)
      Just checking the 28 september 1776 issue. It appears that parliament has forbidden any dealing with the colonies of New Hampfhire, Maffachufett's Bay, Rhode Ifland, Connecticut, New York, New Jerfey, Penfylvania...
      I am curious about OCR fearch engine refults on this publication.
  • by yogibaer (757010) on Tuesday September 09, @06:08AM (#24930649)
    and we are all going to regret it. Remember the public library system? Or the archival organizations? A bunch of highly trained people with literally centuries of experience in classifying and cataloging information, preserving the originals and investing heavily in digitization to help with that task and to make them more accessible? Most of their services are free or at a minimal cost, especially for students and researchers. And completely ad-free (at least here in Europe). Sure, their marketing sucks, they do not have the latest Web x.0 gimmicks. The tend to be a bit stuffier, old fashioned and not as flashy as our bubble heroes of the "do no evil" (but don't do anyting good either) kind, but then they on average tend to think in decades and not in quarterly results. Data (even massive amounts of it) is not information and Google is not a research tool. Google will always tweak search results towards higher advertising revenues. It is at best a brute force instrument with a vey low signal to noise ratio. It is a pest because it leads people to believe that keyword search is a solid method for research and it adds to the funding problems for libraries because who needs a library, when you can "google" everything. Google sucks up all it can get and leaves behind a desert without structure, significance or context, Support and use your local (national) library, while you still have it.