Book-Digitizing Robots 240
Makarand writes "Robotic digitization systems are the new help available to complete
voluminous scanning tasks.
Robots that can turn the pages of books and
newspaper volumes and attain scanning speeds of more than 1000 pages/hour
are now available. They even use puffs of compressed air to separate sticky pages!"
Freedom 'Bots (Score:5, Insightful)
I am not sure it would. It might turn them on to the idea of thinking for themselves, though. That could have interesting consequences. Unfortunately, just this very possiblity is threatening to those who are now profiting from their ignorance. These people are likely in a position to be gatekeepers for the dissemination of information.
But, having a robot do something which is enhanced by mindless repetition is a natural robotic application. Then having that application be something that could enable political liberation is a interesting twist of the old "robots in service to humanity" ideals. I'm not so sure that those holding the reins are going to be so interested in this--call me cynical.
What I would like to see is a similar device for converting analog recordings, in whatever form be at tape, vinyl, wax cylinders, to an open digitized format and then have those recording made available in like fashion. It might be just as interesting to turn those kids in Africa on to Mozart, or oral arguments from the Supreme Court.
Re:Freedom 'Bots (Score:2, Funny)
Re:Freedom 'Bots (Score:5, Funny)
Are you talking about the US Government here?
Re:Freedom 'Bots (Score:2)
I used to wish a third political party could develop in the USA.
Now I'd just like t
Re:Freedom 'Bots (Score:2, Insightful)
Think about the power of bringing food and water to little communities in the middle of Africa. Now that's powerful.
Make them free (Score:2, Interesting)
Ezra Taft Benson
Make them free, and they'll bring the food and water into their villages themselves.
Re:Freedom 'Bots (Score:5, Informative)
In regards to your vinyl recording idea, couldn't you just hook up a record changer (yes, they do make these; they have a big spindle and an arm) to a DAT or similar digital recording device, and then use some audio software to cut tracks at blank space?
Re:Freedom 'Bots (Score:5, Insightful)
Re:Freedom 'Bots (Score:3, Funny)
It might turn them on to the idea of thinking for themselves, though.
Mbutu: Whoa. Plato sez this is all a shadow of some higher plane of existence.
Kwasa: Die Hutu scum!
Unfortunately, just this very possiblity is threatening to those who are now profiting from their ignorance.
Mbutu: Whoa. Marx sez the capitalists exploit the surplus wealth from their employees. Adam Smith sez each person has the ability to trade freely in the marketplace to maximize his
Re:Freedom 'Bots (Score:5, Insightful)
Analog by definition is ALWAYS readable. It is the SINGLE format that is by definiton OPEN, can always be understood by anyone, and can stan the test of time. Aliens could discover an analog recording 50 billion years from now and decode it without knowing ANYTHING else about our culture. But right now, data encoded 25 years ago in an open digital format is often incredibally hard to translate to a usable form.
Digital requires people to understand the digital format. The ONLY advantage to it is quality via the suprression of unintended noises. But if we are copying something that started out as Analog, then the quality improvement is minimal at best.
DO not blindly use Digital for things that Analof is far better.
Better for what? (Score:2, Interesting)
Re:Better for what? (Score:3, Informative)
Re:Freedom 'Bots (Score:5, Insightful)
Hey Glortzotnik! Check this out! These humans, they used lasers to inscribe little hills and valleys in aluminum discs 12" in diameter for video, then smaller hills and valleys in aluminum discs 5" in diameter for audio, and then they used lasers to start chemical reactions that changed the color of a dye later in big sloppy round holes with lots of fuzziness around the edges for video again.
Okay, nothing wrong with that, but the funny part - get this - they called the laser paintings and the chemical dyes "digital", as if it were somehow different from scratching clay with a stick or a wax cylinder with a needle. Laugh riot, these humans!
To a DSP engineer, everything is analog.
Re:Freedom 'Bots (Score:2)
Naivete might be a little soft (Score:3, Interesting)
Having traveled in subsaharan Africa a bit, I can safely say that people I met there aren't "closed to the idea of democracy." (They're sometimes consciously "closed" to the idea of allowing mammoth, conscience-free American-based multinational corporations to subvert the democratic institutions they do have, though.)
I bet tha
Re:Freedom 'Bots (Score:4, Insightful)
Re:Freedom 'Bots (Score:3, Funny)
Word to the wise--since the invasion of Iraq is over now, we're allowed to call them French Bots again.
Yeah but... (Score:3, Funny)
Digitizing Pr0n? (Score:2, Funny)
Whoah! I guess some pr0n really have decent articles.
Re:Digitizing Pr0n? (Score:5, Funny)
M@
Re:Digitizing Pr0n? (Score:2)
Perhaps. But a few good puffs of air could have prevented the pages from sticking.
Re:Digitizing Pr0n? (Score:2)
Heh when I went to Brazil I brought back a porn mag in Portuguese. A year or so later my gf found it and asked me that was about. I told her I just read the articles. She opened it up and with a quizzical look on her face just put it down and dropped the topic.
Hard to read on a screen. (Score:3, Insightful)
Re:Hard to read on a screen. (Score:2)
Re:Hard to read on a screen. (Score:2)
When did you start using computers? Did you read a lot as a child? What color eyes do you have? Do you wear sunglasses? How well do you see in the dark? What brand/model of monitor do you have? What's your brightness setting? Contrast setting? Is your screen gamma-corrected and/or color-corrected?
I prefer paper to monitor. I believe it is because it is much less busy/distracting and my eyes are sensitive to bright light.
Re:Hard to read on a screen. (Score:2, Interesting)
I've made no special adaptations for purity of screen color or gamma.
I have excellent low light vision and wear sunglasses only on the brightest of days or in special circumstances like spending time in high glare situations (on the water, bright sand, snow, etc.).
I've even read entire novels on the comp
Monitor? (Score:2)
Thanks
Re:Hard to read on a screen. (Score:2)
(I'm sure not ALL older people print a lot, but all the prolific printers I know are older).
Re:Hard to read on a screen. (Score:2)
Re:Hard to read on a screen. (Score:2, Funny)
Am I the only person reading Slashdot who gets amused by someone who says that?
You won't get first post that way, anyways...
Application (Score:2)
Short Circuit (Score:5, Funny)
Current Books? (Score:2, Interesting)
Re:Current Books? (Score:2)
Re:Current Books? (Score:2, Informative)
Not sure in the non fiction line of books who has uncrippled digital versions, but in fiction, Baen leads the way, between their Webscriptions service, free library, and the CD's included with some of their recent hardcovers. They provide the books in HTML, RTF, Mircrosoft Reader, some format that's Palm/Psion/WinCE friendly and Rocket Ebook.
The first two are more than enough...
Scanned pages (Score:5, Interesting)
Stuart Inglis's tic98 [waikato.ac.nz] is a lossless compressor designed for black-and-white scanned documents. It achieves better compression ratios than anything else, or at least it did a couple of years ago. If you have scanned documents to make available online, it's fairly simple to write a CGI script to convert tic98 on the fly to PDF.
Hopefully someone else will reply to this comment with a recommendation of good free OCR software.
Re:Scanned pages (Score:5, Informative)
I was the lead developer for the software side that actually does the crunching on the images. However, I'm not sure exactly how much I am allowed to talk about it so I wont. Basically, the software side of it does produce PDFs, JPGs and TXT files from the OCR performed on the images.
Re:Scanned pages (Score:3, Informative)
Re:Scanned pages (Score:2)
Book Ripping and Burning! (Score:5, Funny)
Time for a change in terminology.
Re:Scanned pages (Score:2)
If you are taking the OCRed text and reformatting it, that's a different problem entirely. It is of course essential to OCR the books so that they can be put on the web, grepped and so on. But with storage space being cheap, I think it would be good to preserve the raw scanned images themselves, so that people will be able to stu
Machine Lust: I could use that at work! (Score:2)
I can also think of a few non-work uses for the thing, too. Dare I say, avariciously, "I want one!"
Alternative to flipping (Score:3, Interesting)
This assumes two things: that the ink makes a difference to X-ray penetration compared to just paper, and that the resolution of the scanner is high enough to pick out individual pages. But typical medical scanners are pretty high-res I think. Has anyone tried this?
Re:Scanned pages (Score:2)
The images are originally uncompressed grayscale or color tiffs. The software has an option to produce JPGS from those images, because it was required for certain projects. The TXT files are created from OCR performed on the Tiffs. The PDFs can be just plain image PDFs, or Image PDFs with searchable text. In the case of the latter, OCR is performed on the images before they are inserted into the PDF, and the text it overlayed on the image to allow searching of text.
I ho
Great, but.. (Score:2)
Re:Great, but.. (Score:2, Insightful)
Re:Great, but.. (Score:2)
It can't be _that_ hard to make surely? The ocr software is already done (unless they made lagre improvements there?)
as for the trick of turning pages.. well go for something simple - static electric rod or something.
Just make something that works on say 90% of the pages.
Then you hire someone to sit there and fix it when it goes wrong.
The whole solution would be a hell of a lot cheaper..
Re:Great, but.. (Score:4, Insightful)
I didn't RTFA, but this could be useful not only for developing countries, but as a "force-multiplier" of sorts for smaller community libraries. En masse digitizing of published works would allow smaller libraries to compete on a more even footing with larger ones, without having to invest loads of money into their collections and facilities to hold them.
Any well-heeled library patrons out there want to donate some money earmarked for one of these things to the large library of your choice?
DMCA smack down (Score:4, Funny)
Hmm... (Score:3, Interesting)
Now the magazine rack at 7-11 will show up on Kazoom and all that.
I mean, comic books or "graphic novels" as the nerds call 'em already get traded freely, but that's because some joker with no life takes a day out of his life to scan and crop each page.
But if you could just take the magazines, stick 'em in this robot, then share 'em, it could hurt the publishing industry the way it's hurt the recording industry.
And everyone will justify it by saying "why should I buy a magazine when it only has one good article and the rest is crap!"
So what measures can we expect to see? Lighter inks, crazier fonts to screw with the robots OCR? Funny paper that makes it hard to flip pages?
Re:Hmm... (Score:5, Funny)
I think you just described a typical issue of wired. Are they worried about people copying?
Bob.
Re:Hmm... (Score:3, Insightful)
The music industry hasn't be hurt by filesharing, it has been helped.
People want the CD case, the inside jacket filled with graphics and lyrics.
Similarly, most people hate reading off of a computer monitor. Lots of magazines give away some (or all) of their articles on their webpage already. If anything this'll inspire more subscriptions.
Of cou
I'm all for democracy, of course... (Score:5, Funny)
"Think about the power of bringing our library to little schools in the middle of Africa," Keller said. "Would it make a difference for those who now have their minds closed to the idea of democracy?"
I'm not sure I get the connection:
Mbutu: Hey, Kwasa, check out this copy of "The Horse Whisperer" on my Palm Pilot.
Kwasa: Incredible! We must hold free elections immediately!
Buzzwords without a clue. (Score:2, Funny)
Re:I'm all for democracy, of course... (Score:2)
Especially not electronic books!
Typical shortsighted response (Score:3, Insightful)
Yeah, but if they don't learn to read, they're going to be stuck with the same subsistence agriculture that hasn't worked too fucking well form them recently. That or UN or NGO handouts that only serve to strengthen the oppressive regimes that are torturing these people, because little of the aid that reaches the docks reaches the people thanks to rampant corruption.
Here's the current process:
1. Africa has crappy food production
2. West sends food
3
Re:I'm all for democracy, of course... (Score:2)
Maybe, before you do that, you should inform yourself a little more about Michael Moore and his book [spinsanity.org].
Re:I'm all for democracy, of course... (Score:2)
It must be hard for the South African border guards, keeping a vigilant watch on the western shores for the American boat people, drifting lazily across the ocean. Pity even more the American refugee, who's only [hrw.org] seeking [allafrica.com] [bbc.co.uk] better [bbc.co.uk] life [wsws.org] for them and their families.
Sorry for the tone, but you had it coming.
Project Gutenberg (Score:5, Insightful)
Mechanik
Re:Project Gutenberg (Score:5, Interesting)
So, you may at one point see those books freely available for download, provided they can get those copyright issues ironed out.
Re:Project Gutenberg (Score:2)
Re:Project Gutenberg (Score:4, Informative)
There is a large body of great 20th century works that will not enter the public domain for many years. Stuff by F. Scott Fitzgerald, Joseph Conrad, Arthur Conan Doyle, Rudyard Kipling, Willa Cather, Wallace Stevens, Yeats, Virginia Woolf, et al.
Its a shame. I actually enjoy reading literature, and I am forced to go to the library for anything newer than 1923.
Archival Projects (Score:5, Insightful)
If he could drag this robot along to a courthouse and scan the records over a couple of weeks, it would allow him digitize that information quickly. Not only would the digital copies be easier to search, they would be easier to preserve. One courthouse, where their file room was in the basement, nearly lost all of its old records to a flood.
Re:Archival Projects (Score:2)
i highlu doubt that podunk,NJ 's courthouse made sure that all records were typed on a correctly adjusted Typewriter in a normal font. From what I remember of shuffling throught small town records is that 90% of them are all hand written and no computer on this planet can reliably read that.
No whet is needed in those cases are 3 temp employees who do nothin
Re:Archival Projects (Score:2)
Digital media in a closet are the wrong way to store information. The right way is probably something like OceanStore (http://oceanstore.cs.berkeley.edu/info/overview.
Finger lickin good (Score:4, Funny)
I'm glad they didn't go with the design where it licked its thumb before turning each page. I hate that!
Re:Finger lickin good (Score:2)
They even use puffs of compressed air to separate sticky pages!
I'm glad they didn't go with the design where it licked its thumb before turning each page. I hate that!
Actually, I was thinking this would be a godsend for those who spend their free time scanning in pictures from porno mags!
GMD
I can't wait for digital textbooks (Score:2, Interesting)
LORD - Dont you people see what's happening here?! (Score:5, Funny)
For the love of GOD, someone check this!!
blakespot
Re:LORD - Dont you people see what's happening her (Score:2)
And then some guy named Keanu will save us! And think he can act too!
They might even call it a Second Renaissance [intothematrix.com]!
Why the DMCA really does apply (Score:2, Funny)
Expect to see these outlawed real soon. Either that, or expect a "Steven King" model to be available this fall.
--
Slashdolt
Does it cost that much? (Score:4, Insightful)
Actually . . . (Score:2)
I thought students were PAYING to do this. Just give them some extra credit for finding mistakes;)
Great and now they'll sit idle (Score:2)
It's already starting to hit home... my experience (Score:3, Informative)
There is a lot of information ALREADY converted from text and audio sources at your fingertips that was unfathomable a few years ago. And all of this is free from the website (and links to other sources) from the public library. Talk about your one stop shop.
Heidelburg press (Score:4, Informative)
hm (Score:2)
Use in colleges (Score:2)
Sure there are IP issues to iron out and *gasp* cutting out the middle man (paper publishing) might help make college textbooks actually affordable.
Actually, I'm in the middle of using an ultra-fast scanner at work just to see how this exact setup p
Re:Use in colleges (Score:2)
I'm drooling over the idea alone.
Then buy a sub-notebook, or a PDA (and put the textbook on external storage, like a CompactFlash card) and kiss those backaches goodbye!
Alas, it's unfeasib
This is exactly the technology we need... (Score:2)
-JDF
Destroying books to save them (Score:4, Interesting)
The more traditional way to preserve the contents of the old books is to destroy them in the process. Actually cutting the page out of the book lets you get a much higher quality scan because the page is then really truly flat. (Yes, there are correction techniques for turning scans of non-flat pages into flat "projections" but they aren't nearly as good as just ripping the page out and scanning it.)
Re:Destroying books to save them (Score:2)
They both trash the book, but this should only be a problem for really rare books.
Been around the spook community since 70s/80s (Score:3, Informative)
The hardware has been hard at work since the late 70s/early 80s when PDP-8s and PDP-11s were used to control the hardware and store the results.
The first scanners had very small CCD arrays and these had to be pulled across the page horizontally as well as vertically AND it had vacuum "bars" on robot-arm "page turners".
Re:Been around the spook community since 70s/80s (Score:2)
Distributed Proofreaders (Score:3, Informative)
Once books are digitized and OCR'd they need to be proofread by humans. The people who can afford this machine might do it another way but Project Gutenberg has volunteers at Distributed Proofreaders [archive.org].
There was a Slashdot Article about it last year but there have been a lot of changes since then (many due to Slashdotters). If you haven't seen the project in a while you should check it out.
puffs... (Score:2)
Sometimes, I need a puff of compressed gas to separate my cheeks...
The NWAA reponds to this threat (Score:3, Funny)
"These reading Bots will put the book publishing business under within months..", their congress represenative said.
"There hasn't been this strong of an attack against the goodness of books and authors cince that evil man Gutenburgh created that evil printing press." Word on the street is that Hillary Rosen is oging to be hired as their spokesperson to help outlaw this evil that will undermine american life as we know it.
save your pr0n! (Score:2)
useful when archiving all those old hustler's...
Sticky pages (Score:2)
Oh good, that means these robots can digitize my porn magazine collection!
Obligatory Short Circuit Reference... (Score:2)
journals (Score:2)
And the revolutationary technology of century... (Score:2)
75% of the world's population may finally get telephone access in the 21st century, thanks to the relatively inexpensive infrastructure requirements of cellular phones.
The bicycle, the internal combustion engine, the telephone, the light bulb, the AC generator. 19th century technologies whose impact is yet to be felt in much of the world.
I don't think folks in villages in Africa will be reading about "freedom" on their web browsers any time soon.
1000 pages per hour? (Score:2)
Bandwidth Cap (Score:2)
so when is stanford going to put these books up on the gnutella network? i'd be happy to mirror their collection (or as much as i can) of digitized books on my gnutella node.
What else can it blow... (Score:2)
how expensive? (Score:2)
Finally! (Score:2)
Re:Interesting, but... (Score:2)
Now you know where all the pr0n came from.
Re:But can they also (Score:3, Funny)
Re:But can they also (Score:2)