Book-Digitizing Robots 240
Makarand writes "Robotic digitization systems are the new help available to complete
voluminous scanning tasks.
Robots that can turn the pages of books and
newspaper volumes and attain scanning speeds of more than 1000 pages/hour
are now available. They even use puffs of compressed air to separate sticky pages!"
Re:Scanned pages (Score:5, Informative)
I was the lead developer for the software side that actually does the crunching on the images. However, I'm not sure exactly how much I am allowed to talk about it so I wont. Basically, the software side of it does produce PDFs, JPGs and TXT files from the OCR performed on the images.
Re:Scanned pages (Score:3, Informative)
Re:Freedom 'Bots (Score:5, Informative)
In regards to your vinyl recording idea, couldn't you just hook up a record changer (yes, they do make these; they have a big spindle and an arm) to a DAT or similar digital recording device, and then use some audio software to cut tracks at blank space?
Digitization (Score:1, Informative)
That was in 1999.
Digitizing was the easy part, actually, since the pages were convenintly in A4 paper, but the OCR, oh mighty Cthulhu! I was a young and inexperienced one in those days, and OCR software really wasn't up to the task (we didn't have the money to proofread all that text).
I don't have to tell you how disappoiting it was trying to index 1.2Gb of garbled text.
I miss being naive. =)
Re:Project Gutenberg (Score:4, Informative)
There is a large body of great 20th century works that will not enter the public domain for many years. Stuff by F. Scott Fitzgerald, Joseph Conrad, Arthur Conan Doyle, Rudyard Kipling, Willa Cather, Wallace Stevens, Yeats, Virginia Woolf, et al.
Its a shame. I actually enjoy reading literature, and I am forced to go to the library for anything newer than 1923.
It's already starting to hit home... my experience (Score:3, Informative)
There is a lot of information ALREADY converted from text and audio sources at your fingertips that was unfathomable a few years ago. And all of this is free from the website (and links to other sources) from the public library. Talk about your one stop shop.
Heidelburg press (Score:4, Informative)
Been around the spook community since 70s/80s (Score:3, Informative)
The hardware has been hard at work since the late 70s/early 80s when PDP-8s and PDP-11s were used to control the hardware and store the results.
The first scanners had very small CCD arrays and these had to be pulled across the page horizontally as well as vertically AND it had vacuum "bars" on robot-arm "page turners".
Distributed Proofreaders (Score:3, Informative)
Once books are digitized and OCR'd they need to be proofread by humans. The people who can afford this machine might do it another way but Project Gutenberg has volunteers at Distributed Proofreaders [archive.org].
There was a Slashdot Article about it last year but there have been a lot of changes since then (many due to Slashdotters). If you haven't seen the project in a while you should check it out.
Re:Current Books? (Score:2, Informative)
Not sure in the non fiction line of books who has uncrippled digital versions, but in fiction, Baen leads the way, between their Webscriptions service, free library, and the CD's included with some of their recent hardcovers. They provide the books in HTML, RTF, Mircrosoft Reader, some format that's Palm/Psion/WinCE friendly and Rocket Ebook.
The first two are more than enough... their HTML setup is quite good actually.
Re:Better for what? (Score:3, Informative)