Google To Digitize Millions of Old Newspaper Pages 201
hhavensteincw writes "On Monday Google detailed new plans to digitize millions of newspaper pages with articles, photographs, and headlines intact so they can be accessed and searched online. 'Around the globe, we estimate that there are billions of news pages containing every story ever written,' Google said in a blog post. 'It's our goal to help readers find all of them, from the smallest local weekly paper up to the largest national daily.' For example, Google noted the availability of an original article from the Pittsburgh Post-Gazette from 1969 about the landing on the moon." When you search the news archive for, e.g., "Chicago fire" or "Rosenberg trial," a significant fraction of the result pages cost money to view.
Paydirt! (Score:5, Funny)
http://news.google.com/archivesearch?q=%22armadillo+aerospace%22&scoring=t [google.com]
Fuck I wish Carmack would stop using his Time Machine to get 1957 publicity.
Re:Paydirt! (Score:5, Funny)
I'd like to contribute.
Where can I mail in newspaper clippings?
Re: (Score:3, Funny)
Google (Score:3, Funny)
ALL HAIL GOOGLE. ALL HAIL GOOGLE. ALL HAIL GOOGLE.
Re: (Score:3, Informative)
You must be new here. Here how you should write it up :
I, for one, welcome our new truly great Google overlord.
You are welcome.
No, I'm New Here (Score:3, Funny)
No, I'm New Here
Re: (Score:2)
Man those blinkers fit you so well.
Re: (Score:2)
Whoosh...
Gerry
Re:Google (Score:5, Insightful)
"Don't be evil" is just an advertising slogan, like "At Pontiac we build excietement" (bad brakes, crappy handling), "Chevy - Like A Rock" (damned thing won't start), "At Ford, Quality is job 1" (Got their work cut out for them).
Don't BE evil is a lot different than don't DO evil. They have certainly done evil; look at China, look at their doubleclick purchase, look at that Chrome snafu last week that they quickly rectified (kudos to them for that). Evil can be done mistakenly. And they're a corporation, beholden to no one but their stickholders.
That said, this certainly is Good,. I'm hopeful that their archives will go back to the 1870s, because I may be able to find out what my name is/was.
My late uncle did geneological research, and could not find out anything earlier than his own grandfather (although he found a wealth of information on his mother). My great grandfather, Harry McGrew, wasn't born McGrew. His parents died is a train wreck some time in the 1870s when he was a small child and he was raised by a man named McGrew in Indiana. Indiana law forbits release of adoption records, even that old.
When I first got on the internet I searched for train wrecks in the 1970s but found little to nothing. I haven't really looked since then. But if these archives go back that far, there should be newspaper accounts of train wrecks during that decade.
At any rate, this should be an incredibly valuable resource for a whole lot of people. I salute and thank the people at Google for this.
Historically, history has been written by the victors of conflicts. Recently (the last few hundred years) history has been written by the newspapers. Interestingly, since the newspapers are owned by the corporations that really rule the world, history has STILL been written by the victors.
For example, judging by newspaper accounts only, the US has only two political parties, when in fact we have five parties on the ballot in enough states to win - were the newspapers honest enough to report on them. We're lucky that the newspapers no longer have a lock on what is percieved as reality, and the "third party" parties' web sites wshould leave records for the future.
American Cars (Score:3, Informative)
"Don't be evil" is just an advertising slogan, like "At Pontiac we build excietement" (bad brakes, crappy handling), "Chevy - Like A Rock" (damned thing won't start), "At Ford, Quality is job 1" (Got their work cut out for them).
Pontiac's handling has gotten a lot better. The GTO was a bit squishy but the new G8 is said to be a worthy challenger to the M5. If that's not good brakes and good handling, then I do not know what is.
Similarly, Ford is now routinely winning various quality rankings in it car off
Re: (Score:2, Funny)
Great! (Score:5, Insightful)
Now, all those guys/girls who streaked during Woodstock are going to repent (more).
But seriously...
1. Guy/girl does something goofy in 70s as a teenager.
2. Gets covered by local news (at that time).
3. Google digitises that news.
4. Now CEO (then guy/girl) is suddenly let go.
Who hasn't done something goofy and thought in retrospect wished they hadn't done it (not necessarily something criminal). Google might make their "second chance" disappear.
ps. Carly F. might have seen this coming ;-)
Re: (Score:3, Funny)
[quote]Who hasn't done something goofy and thought in retrospect wished they hadn't done it (not necessarily something criminal). Google might make their "second chance" disappear.[/quote]
If only finding out about these youthful misdemeanours could end someone's career...
http://www.dba-oracle.com/images/bill_gates_albuquerque.jpg [dba-oracle.com]
Re: (Score:3, Interesting)
Re: (Score:3, Funny)
Just look at it this way, the next time a prospective employer is judging you based on what's on your facebook page, you can whip out a photocopy of his naked hairy hippie ass and say, "What, sorry, didn't hear you?"
Re: (Score:3, Insightful)
Gates started the company. More germaine would be President Bush and Vice President Cheney's drunk driving covictions. I'd say something that could result in people getting killed is a lot more serious than streaking.
That said, I found some of my own writings from the 1970s. I'm glad we didn't have the internet, you think my stuff NOW is weird...
Re:Great! (Score:5, Insightful)
Who hasn't done something goofy and thought in retrospect wished they hadn't done it (not necessarily something criminal). Google might make their "second chance" disappear.
Or it might finally make people realize that we are all human, and a stupid act at 18 doesn't equate to judgment post 30. Naaahhh...
Re: (Score:3, Insightful)
"Or it might finally make people realize that we are all human, and a stupid act at 18 doesn't equate to judgment post 30. Naaahhh..."
The truth is people are immature, we live short lives and don't get to reflect much on anything because most people are making a living. I forget which author commented upon the stupidity of the working classes due to lack of time, anyone know?
The problems stem for ignorance and false behaving based on false understanding, we let people have their animal prejudices not based
Re: (Score:2)
The problems stem for ignorance and false behaving based on false understanding, we let people have their animal prejudices not based on anything, other then personal distaste. I think that has to change in the future personally.
How would you like to dictate what people think? Should we impose laws that make it a crime to believe something different than the laws dictate. Maybe we should take it even further and make it illegal to look like a person doesn't think the way the Party wants everyone to think.
--sarcasm-- Obviously we can't let people continue to formulate their own thoughts or else they may continue to form their opinions, good or evil, based on their life experiences. We must instead force people to think the way th
Welcome... (Score:5, Funny)
Or it might finally make people realize that we are all human, and a stupid act at 18 doesn't equate to judgment post 30. Naaahhh...
You must be new here. Welcome to Earth. We're a little strange here, but you will find that some of us can be relaxed and groovy. Enjoy your stay.
P.S. Please take me with you when you leave the planet
Re: (Score:2)
"Groovy?" I'm 56 and never once heard anyone not on a stage use that incredibly stupid, media-coined world. So I agree, please fnd a way off this rock!
I've seen that happen (Score:3, Informative)
Guy/girl does something goofy in 70s as a teenager. Gets covered by local news (at that time).
I've seen that already. I looked up an executive, and Google returned a hit from a student newspaper from the 1960s that they'd digitized from microfilm. The story mentioned the guy being a member of the Socialist Workers Alliance.
Re:I've seen that happen (Score:5, Insightful)
Guy/girl does something goofy in 70s as a teenager. Gets covered by local news (at that time).
I've seen that already. I looked up an executive, and Google returned a hit from a student newspaper from the 1960s that they'd digitized from microfilm. The story mentioned the guy being a member of the Socialist Workers Alliance.
Oh no! Exec dabbled with left wing ideology in youth! By the way I was a member of the Socialist Worker Student Society when I was a student because I was trying to impress a girl. Why would anybody care?
The people that freak me out are Young Conservatives. Those guys are creepy.
Re: (Score:3, Insightful)
> By the way I was a member of the Socialist Worker Student Society when I was a student because I was trying to impress a girl. Why would anybody care?
A new right-wing McCarthy gov might prevent you from working in Schools, Universities and government jobs, you might even be barred from Hollywood.
Re: (Score:2)
Re:I've seen that happen (Score:5, Funny)
Oh no! Exec dabbled with left wing ideology in youth! By the way I was a member of the Socialist Worker Student Society when I was a student because I was trying to impress a girl. Why would anybody care?
I can see why this would be harmful to his career. As soon as word got out that, at some point in his past, he actually cared about people, his reputation as a business executive would be ruined. He might never get another six-figure salaried job again.
Re:I've seen that happen (Score:4, Funny)
The people that freak me out are Young Conservatives. Those guys are creepy.
They're in it for the women.
Re:I've seen that happen (Score:5, Informative)
And the post-coital "I voted for George W" reveal is awesome.
Re:New universal explanation (Score:2)
Re: (Score:2, Interesting)
"Any man who is under 30, and is not a liberal, has no heart; and any man who is over 30, and is not a conservative, has no brains." Attrib. various, including Churchill.
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Funny)
No - I have a monopoly on creepy - stop stealing my thunder!
Re:Great! (Score:5, Funny)
Who hasn't done something goofy and thought in retrospect wished they hadn't done it (not necessarily something criminal).
Those that didn't get caught?
Re: (Score:3, Insightful)
Better than the current system where every old story is a scandal. A corollary would be the production of artificial sugars. The first one out was relatively safe (cancerous, but less so than all subsequent sugars), but it was the on
Re: (Score:2)
Oh, so it's not good enough that in the past 5-10 years people are having their lives ruined by spontaneous (and stupid) acts captured by cell phone cameras and put up on the web. Now, we have to go back in time and ruin the the lives of people who thought they were home free.
I agree. Fuck 'em.
Re: (Score:2)
Re: (Score:2, Funny)
Re: (Score:2)
Re: (Score:2)
At last! (Score:5, Interesting)
I welcome this news. For too long, research on the Internet has been a frustrating task. For any events after about 1997, there's oodles of information. However there's a giant hole in the amount of information available for events before then. Google Books went some way towards addressing this, but it was still an intense task because a lot of the time, you still have to find and buy the books (or find them in a Library).
I really hope they plan to go as far as putting local, regional newspapers online as well.
Re: (Score:3, Interesting)
Google News is much more functional in this regard, obviously, but it would be nice if a normal Google search were date sensitive. Yes, I know that that would require proper metadata tagging of the entire Inte
Re:At last! (Score:5, Informative)
Re: (Score:3, Funny)
stardates would go a long way towards fixing this.
Re: (Score:3, Informative)
They're working on it [google.com]
Re: (Score:2)
Forget about stuff before 1997.
I was recently researching a local event that occurred in July 2007 and was on the front page of the local paper at the time.
I have to use the paper's pay-per-view to get a digitized copy or paper reproduction of the article or find a paper copy in my local library's archive (if they even have a copy).
And this newspaper has a pretty nice web page with search and everything.
At last, something GOOD, from Google! (Score:5, Interesting)
At last, something that looks really GOOD, from Google! With free access, this will really change the world, even more.
History revisionists will find it even more difficult to dupe.
Maybe there are serious drawbacks, but, for the time I cannot see anything but the positive aspects.
Re:At last, something GOOD, from Google! (Score:4, Insightful)
Actually history revisionists will not be affected by this at all. Remember many, if not most, of the "news" in the newspapers are (and have been) editorialized by various degrees. To make it worse, if you go back long enough you hit times where communication was so difficult between different countries that the news was basically "We heard that he heard that she heard that this is true".
Gather enough newspapers from all around the country and pretty much anything you find will be almost as reliable as finding something written by a random blogger on the web.
Re:At last, something GOOD, from Google! (Score:5, Insightful)
Gather enough newspapers from all around the country and pretty much anything you find will be almost as reliable as finding something written by a random blogger on the web.
I find this comparison a little shaky. Major newspapers have long used professional (paid) journalists who are overseen by professional (paid) editors - both with reputations to protect. I don't see this type of control from a random blogger.
Re:At last, something GOOD, from Google! (Score:5, Insightful)
News agencies. (Score:4, Insightful)
Do you know anybody who works in the news media? I do, several guys both in TV and paper news who have been placed all over the spectrum from editing room floors to the administrative level and even teaching positions at media and public relations colleges. They ALL report (privately) that the whole game is a giant crock of malarkey. The most interesting aspect is when the news teams don't even realize they're doing it, but simply re-broadcast biases and falsehoods because they are part of a form of non-deliberate groupthink. But it's worst when suggested stories are simply struck from the record because they don't match up with whatever political beliefs the owner happens to hold.
One of the big problems is the AP Newswire, to which so many large journals subscribe and pull feeds from word for word. --One thin little bottleneck through which major breaking news passes, meaning entire nations uniformly learn about events which are filtered by only a very small number of people.
The intriguing thing about bloggers is that they don't do this; they represent a broad and varied non-uniform message. This does not mean all bloggers are accurate or that there isn't the internet 'echo chamber' effect going on, but it does mean that there is actually a higher probability of actual news coming through the system. Have you ever clicked into democracynow.com? Some of the more prolific blogger sites have their own journalists covering stories and you generally get broader coverage, and people being interviewed in a non-soundbite kind of way.
-FL
Re:At last, something GOOD, from Google! (Score:5, Interesting)
There are serious drawbacks, but mostly they aren't actually Google's fault.
The problem is, this kind of preservation costs serious money - so it's only done once from one master. Then that one master is distributed widely.
An anecdote from the early 90's, when moving newspaper archives onto microfiche really got started in a serious way. A friend was doing research for a college thesis, and the microfiche copy at his university of an obscure and long defunct western paper was missing a page (a page of the newspaper had been lost sometime in the past and thus was not in the microfiche copy) - the precise page he needed in fact. So he called around and got photocopies (real photocopies back then) from other universities whose libraries held microfiche copies of that newspaper.
Each and every one of them was missing the same page.
Turns out one library had paid to have their archives copied onto microfiche - and then recouped their costs by selling copies. Each and every library that had held dead tree copies had replaced them with this microfiche and then heaved the hardcopies into the dumpster.
That page is now forever lost to history.
Re: (Score:3, Interesting)
Here's a not so funny story along the same vein. Back in 1921 there was a little race war [wikipedia.org] in Tulsa, OK. Being less numerous, the blacks lost and their part of town was burned to the ground. Nobody to this day knows how many died in its defense and the ensuing carnage.
One of the immediate causes was said to be an article in one of the Tulsa papers. In the ensuing coverup, all copies of that article seem to have disappeared. You can go try to look it up in your local library today if you want. Any copy of the
Should be great for armchair historians... (Score:5, Interesting)
Re:Should be great for armchair historians... (Score:5, Informative)
Re: (Score:2)
Re: (Score:2)
The wayback machine is a wonderful resource, but woefully incomplete. My old Quake site is there, but not completely there. A lot of the graphics are missing. And Niel harriot's Yello There, a hilarious parody of Blue's News, is completey missing, except one page of his site that I had posted at my site.
Neil was a Brit who had MS and I haven't heard from him in years, the last email I got from him he was in a wheelchair. I fear he has left the planet.
They even kept the peep show ad (Score:2)
Uh-oh! (Score:5, Funny)
I hope to god that they edit out the advertising otherwise all us consumers will be frantic with longing for products that are no longer available, what with advertising not being a huge sham and all!
Re:Uh-oh! (Score:5, Interesting)
Funny enough, I checked out the example just to see the advertising on the paper. We all know enough about the moon landing I really don't need to see a 1969 paper of the info. I wanted to see 1) How big the headline is (you notice that you don't see the old 200+pt size headlines on papers now that we used to see for things like wars ending, man on the moon, ect), and 2) Getting a kick out of the old school graphic design and ads in the paper. I was zoomed in reading the movie listing on the opposite page (I guess the back) from the moon-landing story. I didn't see any prices for admission (something to raise my ire at the current $7 "matinee") but I didn't see any evidence they had removed it either.
Re: (Score:2)
A few years ago when I was a student I could get a cinema ticket for £2.95, adult tickets were possibly in the region of £4. That was around 5-6 years ago. Now adult tickets are over £7! Thankfully at one chain of cinemas you can get an unlimited pass for £12 a month, so unless you only see one film a month there's no reason not to get the pass!
Actually come to think of it, what with all the advertising before the films, the prices should be at least staying c
screw the kennedys (Score:2)
Interesting, considering pay-for NYT archives (Score:2)
I recently did some research that had me looking in the NYT's article archive. Thankfully, it was in the 1900-1920 period, so all the articles I wanted to access were free.
However, if I had been doing research in a later time period, say 1930-1940, I would have had to pay for access to the articles (well, probably not me - I'm sure we have institutional access, but someone would have had to pay).
Google seems to be offering this free of charge to viewers, at least initially. It sounds like they've obtained t
Feeling a bit ill (Score:5, Funny)
Re:Feeling a bit ill (Score:5, Funny)
Quick! Run to Congress and buy some laws to protect your ailing business model!
There's no time to waste!
Re: (Score:2, Funny)
Just buy databases? (Score:5, Interesting)
Incidentally, if you're close to a university or a good library, many of these places already hold subscriptions to such services and offer the use of them for free. I'd love to see Google expand upon this already-good base rather than duplicating effort.
yes, and while they're at it (Score:2)
Perhaps Google could just send some money directly to me.
Don't get me wrong, I would love to see this happen, but I'm not sure google would conclude that there's a lot in it for them to do this.
News cartels... (Score:2, Interesting)
The Times are already out there (Score:5, Informative)
You can already access the archives of The Times online :
http://archive.timesonline.co.uk/ [timesonline.co.uk]
It's quite interesting to read about Marie-Antoinette's execution or Jack the Ripper's crimes, I especially like the writing style :)
Re: (Score:2)
Distributed computing? (Score:4, Funny)
From Google's perspective it makes perfect sense to use idle cycles on Aunt Harriet's aging Dell to serve googlicious applications to an eager populace. Why shouldn't she host your gmail account?
The whole concept can even be justified from an environmental point of view: scaling is naturally proportional to demand and load-spreading is extremely efficient. In the long term, Google won't need any of its own hardware other than expensive corporate buildings equipped with limitless executive toys and a few dumb terminals. Hell, we're beginning to see that already. Everyone benefits.
As for the the spam emanating from botnets, this is a mere smoke-screen (or should I say cloud-screen?) designed to keep us off the scent.
I, for one, salute our new Gotnet overlord.
As the old saying goes... (Score:4, Funny)
"The Google makes work for idle scanners."
I can see it already: (Score:2)
Let's hope they'll not be too selective in which articles they publish.
A cure for seasickness? (Score:2, Funny)
I don't care if they take over the world, just so long as i don't have to scroll through years of microfilmed newspapers ever again - it makes me feel seasick!
nail in the coffin (Score:4, Funny)
Hardly the first... (Score:5, Informative)
Re:Hardly the first... (Score:5, Funny)
I am curious about OCR fearch engine refults on this publication.
Re: (Score:3, Funny)
They also reported that the colonists' Declaration included a statement about "Life, Liberty, and the Purfuit of Happineff."
Re: (Score:2)
This actually brings up a great point. When I worked for a government agency converting their paper forms to digital we were required to accept the paper form and enter them in manually. I set up an OCR process for the data entry team but every single page scanned in had to be checked for correctness when compared to the original. It was a very tedious process and showed the limitations of OCR software. This was just a few years ago.
So scanning them in using OCR is just one step, proof-reading the results
Re: (Score:2)
I see you typed your comment on an old Underwood and OCRed it!
Dude, keyboards are cheap these days...
Paperspast (Score:2, Informative)
http://paperspast.natlib.govt.nz
(already being done in New Zealand for some years thanks to the work of the National Library of New Zealand) papers available back to 1839. With text search too! Cool!
Now if only the book police... (Score:4, Interesting)
... would allow google to do the same thing. There's been so many times what was interesting came up in a book google searched only to have pages blanked out. Sometimes I wonder if they should just put advertising on the book itself and pay the owners/authors directly (for the hits/adclicks/being read, etc).
Re: (Score:2)
Re: (Score:2)
Speaking of police, now your 1979 DUI/assault/posession/check fraud arrest will be known by everyone who knows you.
I personally don't like the world that Google is creating, and I don't think they should have a right to transform society so much without public oversight.
Re: (Score:2)
One thing that annoys me about Google (and all the other search engines as well) is if you search for a book that is in the public domain (Mark Train, Shakespeare, etc) it usually has Amazon.com as its first result.
For those wondering... (Score:2)
...the first known publication of Duke Nukem Foverer is dated November 1997. http://en.wikipedia.org/wiki/Image:Dnf-lol.jpg [wikipedia.org]
So when is google going to..... (Score:2, Funny)
Google kills the library star... (Score:5, Insightful)
Re:Google kills the library star... (Score:5, Interesting)
Libraries will adapt.
Maybe google will sell pre-filled servers to libraries that contain a terabyte of the news archive and a way to update directly from google.com for a nominal fee.
Maybe libraries will just use the google archive and save all the expense and space of the microfilm archive and put it to better use.
Re: (Score:2)
Automobiles killed the buggy whip industry. The incandescant light bulb killed the candle, kerosine, and gas lamp industries. The computer killed the typewriter.
You wish for these things to return?
Re:No but (Score:3, Insightful)
Ever hear of a slow news day? (Score:2)
Compared to the summer of '69, this is a slow news year . Yes, I'm old enough to remember all that stuff. I don't remember it happening all in the same day, but it sure is interesting.
Oh shit.... (Score:2)
...now I know what Google really is! [wikipedia.org]
Digital Newspapers are great... (Score:2)
OCR tool? (Score:2)
So what is the best OCR package that runs on Linux?
Why must it cost money to view them? (Score:2)
Having to pay to view these old articles is irritating.
I realize it costs money to scan and archive them, but perhaps these costs can be covered by putting Google Adwords on the sides and using advertising?
This sort of resource is invaluable. I can go to the library right now and go through newspaper archives on microfilm; Google should find a way to offer the same online without charging.
What a beautiful way to look into history, by reading the news articles of the day.
I hope they can make this happen for
"One small step for man" (Score:2)
Re:Awesome (Score:4, Interesting)
Now I can find out everyone I knew who's died with Google archiving the obituaries.
I'm not sure why this was modded "Funny". If Google really is doing regional and local papers, given enough time and effort on Google's part, I may well be able to find stories and obits detailing the lives of relatives and grandparents with whom I never had the opportunity to talk.
Now, if Facebook gets in on this action, things could get a little bit creepy. I don't look forward to being cyber-stalked by the dead.
Re: (Score:2)
Now, if Facebook gets in on this action, things could get a little bit creepy. I don't look forward to being cyber-stalked by the dead.
What? I loved playing online Quake!