Google Book Scanning Efforts Not Open Enough? 113
An anonymous reader writes to mention the Washington Post is reporting that the Open Content Alliance is taking the latest shot at Google's book scanning program. Complaining that having all of the books under the "control" of one corporation wouldn't be open enough, the New York-based foundation is planning on announcing a $1 million grant to the Internet Archive to achieve the same end. From the article: "A splinter group called the Open Content Alliance favors a less restrictive approach to prevent mankind's accumulated knowledge from being controlled by a commercial entity, even if it's a company like Google that has embraced 'Don't Be Evil' as its creed. 'You are talking about the fruits of our civilization and culture. You want to keep it open and certainly don't want any company to enclose it,' said Doron Weber, program director of public understanding of science and technology for the Alfred P. Sloan Foundation."
Good! (Score:5, Insightful)
Ideally we could set up a few hundred digital libraries that would all hold some percentage of the catalog, so that any 5 would be able to duplicate the entire catalog. That way, in the event of a catastrophe or some kind of weird global event, it would be more likely that an uncorrupted copy could be found.
I'd definitely like to see some not-for-profits get involved.
Google's goof (Score:4, Insightful)
Re:Just Open Source It? (Score:3, Insightful)
Well, the source of the code running the project wouldn't be that helpful, it's the content we're after.
And presuming you meant Google opening the content.... well I doubt it... they want to sell ads on the content after all!
Don't forget, google nice tho' they are haven't given out code/content/etc for any of their "crown jewels"
Scanning a book is easy... (Score:5, Insightful)
Scanning a book is easy, it simply involves taking pictures. You can splice the spine off an take pictures of each page or use one of the panoply of non-destructive machines to correct the page warping effects of an open book. This is not particularly hard or expensive.
Damn straight. The OCR process is the hardest part, of course they wouldn't allow access to highly valuable text to others. They might have a million books "scanned" this year but each page has to be OCRed. Most people don't decouple those operations and assume that after scanning the hard part is over. Say each book has 300 pages, so we're talking about running 300 million pages of text through OCR. Now you've got a real problem. How does one know if a page of a book is OCRed correctly? You can pay a human or even a large team of humans to QA the text but even then you can only spot check here and there. A 99.99% correct OCR program will mess up on the equivalent of 150,000 pages of text a year (spread out more or less uniformly across the 300 million). Also, not all pages of books are scanable (pictures, weird fonts, weird page layouts), and then there are headaches with keeping track of the related editions of a books, multiple editions of books, displaying pictures in the reader you don't have copyright to (which I think always gets glossed over with these sorts of articles), 10 digit to 13 digit ISBNs, etc. So yes, they aren't going to allow access to the text to others, because it's hard and expensive to do so because you can only automate so much if you want to the ensure accuracy of the text itself (I think Google does). If they opened the text up what stops the competitors from simply adding the data into their search engines after the difficult part is over? Google does no evil but they aren't stupid.
Re: Google 'Do No Evil' ... (Score:4, Insightful)
Oh, do calm down... They never claimed "we do absolutely no evil whatsoever", it's more like - the founders happen to think that "evil should not be done". What's a lie about that? Also, how does inflated stock make them evil?
And how, pray, are they supposed to survive without the adverts? Never mind the fact that Google didn't actually come up with online advertising but were pretty much the first ones to run targeted, non-offensive (as in, no flashing banners, pop-ups, etc.) ads.
I'm no Google fanboy, although I happily use many of their services. But I don't think there's anything inherently wrong with them, and I find it somewhat sad to see this paranoid drivel modded up to +3 Insightful.
the books aren't going anywhere... (Score:5, Insightful)
You folks do realise that Google returns the books after they scan them so they'll still be in the libraries afterwards right? So how does this reduce their availability?
Re:Good! (Score:3, Insightful)
I can read data from ten years ago on my home computer with no problems.
If we ahve a 100 year disruption, well then we are probably throwing rocks at one another and rebuilding civilization.
Did someone break their legs? (Score:3, Insightful)
Did someone break their legs?
See that big building downtown with all the books in it?
Oh wait, get up from your desk, go outside (yes I know, it burns...), get on the bus and go downtown.
OK, now see the big building with the strange letters "LIBRARY" on the front? OK, that's the one, go inside... see all the books?
Now go up to the attendant at the desk and tell them your name and address and show a piece of photo ID. The nice person will give you a card that you can use to borrow books.
What's a book? OK, its many pages of paper bound together usually with glue and string. On each of these "pages" you will find ink (a dye) in the pattern of letters that form words and sentences and paragraphs.
Usually, these "books" tell a story or provide organised information.
No go ahead, pick one out - they'll even let you take it home for a week or two so you can read it. For free!
You can browse the stacks (a colloquialism for those big shelves with books on them) which are organised according to a system known as the Dewey Decimal System. You can use a revolutionary piece of technology known as a "card catalog" to indicate the position of the title you seek on the stacks (though many libraries have this same catalog searchable from computer terminals).
It's revolutionary, I know. But there you have it, free information and entertainment, enough to last a lifetime, with a "less restrictive approach".
Enjoy.
Re:Good! (Score:4, Insightful)
I've been using computers for well more than 10 years, and ASCII is still just as readable as ever.
Mark-up languages like HTML, XML, or RTF may die off eventually (several hundred years at least), but you can always strip the markup (either with code, or mentally by ignoring it). Plus, with the formats being so simple, and book layout being so obvious, it should take 5 minutes to write a new parser for any of them.
Both of the above would be unreadable by the standard pick-up mechanism, but manually reading it, bit-by-bit with something like an electron microscope should be possible for many, many more years after that. Just as technology has made it possible to read previously erased text on paper, so to will it be easier, in the future, to read physically decaying digital media.
It takes many thousands of years for even uncommon languages to disappear. And if they were even remotely similar to our own, they can be deciphered without any advanced knowledge. So, I'd be worried about the long-term chances of a complex language like Chinese to be preserved, but anything with Latin roots, that uses a small alphabet should do fine.
Besides that, you can ensure the language survives by having multiple language tranlations, side-by-side. If any one of them is understood in the distant future, they can use it to learn all the rest. See: The Rosetta Stone
Enclose what? (Score:3, Insightful)