Google Book Scanning Efforts Not Open Enough? 113
An anonymous reader writes to mention the Washington Post is reporting that the Open Content Alliance is taking the latest shot at Google's book scanning program. Complaining that having all of the books under the "control" of one corporation wouldn't be open enough, the New York-based foundation is planning on announcing a $1 million grant to the Internet Archive to achieve the same end. From the article: "A splinter group called the Open Content Alliance favors a less restrictive approach to prevent mankind's accumulated knowledge from being controlled by a commercial entity, even if it's a company like Google that has embraced 'Don't Be Evil' as its creed. 'You are talking about the fruits of our civilization and culture. You want to keep it open and certainly don't want any company to enclose it,' said Doron Weber, program director of public understanding of science and technology for the Alfred P. Sloan Foundation."
Re:Google's got a long way to go . . . (Score:5, Informative)
Please do a better job, not just a bigger job (Score:3, Informative)
Scanning with a flat-bed scanner basically wrecks the binding. So the books probably need to be rebound afterwards, or can be discarded.
There are photography setups (e.g. Phase One has one) but the resolution is too low, even with a 40 megapixel medium-format camera (yes, they are used for this). A little high-school mathematics (e.g. Nyquist) and the back of an envelope, combined with some measurements, will show that if you scan engravings at under 1200dpi, you will lose a lot of detail, and indeed, compare for example the Alice in Wonderland pictures [fromoldbooks.org] on my own site with the Project Gutenberg ones. You can read the engraver's signature on most of the ones I have. Yes, the bandwidth needed to host higher resolution images is greater (which is why I have ads, sorry). But it's worth it.
Some of these books will never be scanned again. Even for OCR, 400dpi grayscale seems a minimum for footnotes and other small text even in English.
I'd also like to see more interfaced like the Project Gutenberg Distributed Proofreaders' site where people can submit corrections. Maybe use a WIKI for the transcription??
Liam
Re:Good! (Score:2, Informative)
money (Score:1, Informative)
Don't know about you, but I would pop for a yearly subscription for a *good quality* search engine that had a toggle for "with adverts" or "no adverts" option. Not sure how much I would spend, that would depend on how good they were on filtering out link farms, etc, but some reasonable fee to have the option of no ads. And then websites might have an indcement to restrict use of ads to at least the interior pages and nt the main public facing page. Ads there just suck.
Right now I would classify the free google search with ads as being of medium quality until you get good at it with a lot of -restrict this and that word added to your query and learning wild cards and domain restrictions, etc. In fact, I wish google had one simple option on their main page, split their search bar in two by default, one side is for words/phrases you are looking for, the other side is what you want to immediately filter out. For example if you add -sale, you eliminate a lot of commercial sites. Dogsquat simple, hardly anyone does it.
Google is good once you learn to use it, by default like most people use it though it's just a fancy yellow pages.
Re:Good! (Score:3, Informative)
Re:Just Open Source It? (Score:2, Informative)