Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Books Media Technology

Book-Digitizing Robots 240

Makarand writes "Robotic digitization systems are the new help available to complete voluminous scanning tasks. Robots that can turn the pages of books and newspaper volumes and attain scanning speeds of more than 1000 pages/hour are now available. They even use puffs of compressed air to separate sticky pages!"
This discussion has been archived. No new comments can be posted.

Book-Digitizing Robots

Comments Filter:
  • Re:Scanned pages (Score:5, Informative)

    by tempestdata ( 457317 ) on Wednesday May 21, 2003 @10:20AM (#6007062)
    Actually, I've seen this robot operate in person and it is a work of art. The way the arms move makes you think its going to rip the book to pieces, yet some how it manages to pick up exactly one page( It detects if its picked up two pages and drops the extra page) and flip it.

    I was the lead developer for the software side that actually does the crunching on the images. However, I'm not sure exactly how much I am allowed to talk about it so I wont. Basically, the software side of it does produce PDFs, JPGs and TXT files from the OCR performed on the images.
  • Re:Scanned pages (Score:3, Informative)

    by tempestdata ( 457317 ) on Wednesday May 21, 2003 @10:22AM (#6007080)
    Oh... and no, unfortunately, its not open souce.
  • Re:Freedom 'Bots (Score:5, Informative)

    by KrispyKringle ( 672903 ) on Wednesday May 21, 2003 @10:44AM (#6007245)
    Interesting point. However, its useful to note that there are a lot of charitable and commercial corporations which currently fund (perhaps for the PR value rather than their own good intentions, and because the US dollar goes so far in most parts of Africa) technology initiatives and other educational programs. I've posted in the past about a program I'm involved in funded by a couple US coporations to put computers and networks in a West African university.

    In regards to your vinyl recording idea, couldn't you just hook up a record changer (yes, they do make these; they have a big spindle and an arm) to a DAT or similar digital recording device, and then use some audio software to cut tracks at blank space?

  • Digitization (Score:1, Informative)

    by kriox ( 630423 ) on Wednesday May 21, 2003 @11:00AM (#6007359)
    I once took part in a project that intended to digitize millions of newspaper clips, some of them copies of more than 125 years old originals.

    That was in 1999.

    Digitizing was the easy part, actually, since the pages were convenintly in A4 paper, but the OCR, oh mighty Cthulhu! I was a young and inexperienced one in those days, and OCR software really wasn't up to the task (we didn't have the money to proofread all that text).

    I don't have to tell you how disappoiting it was trying to index 1.2Gb of garbled text.

    I miss being naive. =)
  • Re:Project Gutenberg (Score:4, Informative)

    by Musashi Miyamoto ( 662091 ) on Wednesday May 21, 2003 @11:06AM (#6007399)
    Actually, the primary thing holding up Project Gutenberg is the Sonny Bono Copyright Extension Act. The copyright law was recently extended so that nothing created earlier than the 1920s is going into the public domain.

    There is a large body of great 20th century works that will not enter the public domain for many years. Stuff by F. Scott Fitzgerald, Joseph Conrad, Arthur Conan Doyle, Rudyard Kipling, Willa Cather, Wallace Stevens, Yeats, Virginia Woolf, et al.

    Its a shame. I actually enjoy reading literature, and I am forced to go to the library for anything newer than 1923.
  • Not to long ago I had to do a research paper for a college class. No big deal, I've done many of them, and I was not looking forward to this one. Well, I went to the Houston Public Library in Downtown (which I hadn't been to in many many many , you get the idea, years). I got the library card that gave me access to some computer terminals and computer card catalogue. I was amazed about what they had converted electronically and links to other sites that had dictated material. I was also amazed that I could get all this same access from home using the information printed on the library card. So I go home (I have Road Runner cable modem) and do my research instead of being trapped in the library and get to work. I find electronic format of lots and lots of textbooks, magazines, government docs, and many many more. What put me a notch or two down from my high horse was that I even found that they had radio talk shows transcribed (which I used in my research paper) that helped a lot!

    There is a lot of information ALREADY converted from text and audio sources at your fingertips that was unfathomable a few years ago. And all of this is free from the website (and links to other sources) from the public library. Talk about your one stop shop.
  • Heidelburg press (Score:4, Informative)

    by Ars-Fartsica ( 166957 ) on Wednesday May 21, 2003 @11:14AM (#6007460)
    Using air to separate and move paper is not new. Heidelburg platen presses (you may remember them from high school graphic arts classes) have had this feature for about fifty years.
  • by crovira ( 10242 ) on Wednesday May 21, 2003 @11:31AM (#6007581) Homepage
    This is not new.

    The hardware has been hard at work since the late 70s/early 80s when PDP-8s and PDP-11s were used to control the hardware and store the results.

    The first scanners had very small CCD arrays and these had to be pulled across the page horizontally as well as vertically AND it had vacuum "bars" on robot-arm "page turners".
  • by Ugmo ( 36922 ) on Wednesday May 21, 2003 @11:36AM (#6007622)

    Once books are digitized and OCR'd they need to be proofread by humans. The people who can afford this machine might do it another way but Project Gutenberg has volunteers at Distributed Proofreaders [archive.org].

    There was a Slashdot Article about it last year but there have been a lot of changes since then (many due to Slashdotters). If you haven't seen the project in a while you should check it out.

  • Re:Current Books? (Score:2, Informative)

    by Drakin ( 415182 ) on Wednesday May 21, 2003 @11:57AM (#6007771)
    I beleive that up until recently most contracts between publishers and authors didn't include rights to publish digital versions.

    Not sure in the non fiction line of books who has uncrippled digital versions, but in fiction, Baen leads the way, between their Webscriptions service, free library, and the CD's included with some of their recent hardcovers. They provide the books in HTML, RTF, Mircrosoft Reader, some format that's Palm/Psion/WinCE friendly and Rocket Ebook.

    The first two are more than enough... their HTML setup is quite good actually.
  • Re:Better for what? (Score:3, Informative)

    by sketerpot ( 454020 ) <sketerpot&gmail,com> on Wednesday May 21, 2003 @05:54PM (#6011172)
    Which is why you use forward error correction. Have you ever scratched a CD? It can still play, thanks to FEC. (Cross Interleave Reed-Solomon Code, to be specific---good at correcting fairly small numbers of errors, like somebody drilling a 1 mm hole in a CD).

"God is a comedian playing to an audience too afraid to laugh." - Voltaire

Working...