Google's Book Scanning Technology Revealed 100
blee37 writes "Last March we discussed Google's patent for a rapid book scanning system. This article describes and provides pictures of how the system works in practice. Google is secretive, but the system's inner workings were apparently divulged by University of Tokyo researchers who wrote a research article on essentially identical technology. There are also videos of robotic page flippers and information about how Google wants to use music to help humans flip pages."
Now my PC (Score:5, Funny)
Re: (Score:2)
NO, but i can scan it for you if you like...
MRI technology? (Score:5, Interesting)
I often wondered if it would be possible for a book to be scanned while closed, using some kind of MRI technology that digitally sliced the book page by page, picking up on the density difference between the ink and the paper slice by slice.
Re: (Score:3, Insightful)
Re: (Score:2)
No, but a book is not 6 feet tall. The issue is, is the MRI, slow as it is, faster than flipping each page manually 300 times and taking individual scans? I agree the tech isn't there yet, but it isn't inherently impossible.
Plus you could scan a whole stack of books. (Score:2)
Plus, I could envision a system where you loaded many books into a cartridge of sorts, say about 6 feet long, with a divider of some kind placed between each book.
As the scanner worked its way down the cartridge, it could detect the dividers, which would delineate one book from the next.
Thus even if the scanner were slow, perhaps it could scan say 50 books in one pass.
Steve
Re: (Score:2)
You could probably set up an extra google book scanner and pay someone to staff it round the clock for about the same price as taking an MRI of a book with some special MRI setup.
Resolution dependant (Score:2)
No, but a book is not 6 feet tall.
The speed of an MRI is proportional to the resolution you wan to have.
More voxels = More time.
The issue is, is the MRI, slow as it is, faster than flipping each page manually 300 times and taking individual scans?
In the Z dimension, you'll need an insanely extreme resolution to be able to tell apart 300 pages. And on each pages, you need a resolution high enough so the pages are clearly visible and not blurry.
For the record, a hidef anatomy-research-grade MRI scan of a brain has only 256x256x200 voxels and can take half an hour. And for that we used special high-speed techniques (3d mprage), which restrict you to 200 voxels
Re: (Score:2)
Re: (Score:3, Interesting)
I haven't verified this, but my father-in-law told me the guy that invented the MRI wanted to develop it as a medical scanner to the point where it was cheap enough that everybody could afford it. Then GE et al locked up the idea and turned it into a profit center.
Re: (Score:2)
Re: (Score:1, Troll)
You know how much helium costs, especially the amount you need for an MRI machine?
Yes. Liquid helium in bulk costs about as much as Coca Cola from a vending machine.
Re: (Score:3, Interesting)
The downside of liquid helium is that it's damned expensive, and getting more so by the minute. Running at full capacity I could blow through several thousand liters in a year, and at several dollars a liter minimum plus overhead, that's real money. As a bonus, lately our supplier of helium has become incredibly unreliable, missing orders and generally flaking out, while simultaneously raising prices because of actual production shortages. I just had to read the sales guy the riot act, and if service doesn't improve darn fast, we'll take our business elsewhere, as will the other users on campus. (Helium comes from the radioactive decay of uranium and other alpha emitters deep in the earth, and comes out of natural gas wells.) The long-term solutions are (a) set up as many cryogen-free systems as possible, and (b) get a helium liquifier to recycle the helium that we do use. Unfortunately, (a) requires an upfront cost comparable to about 8 years of a system's helium consumption per system, and (b) also necessitates big capital expenses as well as an ongoing maintenance issue. Of course none of these kinds of costs are the sort of thing that it's easy to convince a funding agency to support. Too boring and pedestrian.
By the way, I spend most of my days on site at the largest US particle accelerator. Let me know if you'd like to chat with the cryo dept. about how much the tankers of liqiud helium cost ;)
Re: (Score:3, Interesting)
I never heard the story but you might be confusing MRI with X-Ray machines. You might also remember the stories about X-Rays in shoe stores and why that wasn't a good idea.
But either way, the costs are not unrealistically high, you can pick up a used MRI machine for about a 100k. GE doesn't have a monopoly on MRI's, Siemens, Hitachi and a few others make them as well. The simple physics alone however would not allow an MRI machine for most people though. The magnets involved are just too strong that they be
Re: (Score:2)
Re:MRI technology? (Score:5, Insightful)
When the book is closed, the ink from facing pages will be mashed together, shouol you will need to be able to tell which page the ink is attached to. Since the ink mostly sits on top of the paper (if it soaks through you wouldn't be able to read the other side veery well) it is a very thin layer. Your scanning technology would need to be able to sense very small volumes of ink. I don't think we are anywhere close to the necessary precision yet.
Re: (Score:2)
MRIs have resolution down to 90nm.
Simpler/faster solution would be to insert a piece of paper in between all the pages to be scanned. Then do the MRI. If the OCR turns up 0 hits, mirror the page and run it through again.
Or make recaptcha users keep a mirror at their desk.
Re: (Score:1, Insightful)
If you have to insert a piece of paper in between each page, wouldn't it be easier just to image the pages while they're open?
Re: (Score:1)
In depth? Even if that's so, it's considerably less than the error caused by crinkling and curvature of the pages.
Re: (Score:2, Insightful)
MRIs have resolution down to 90nm.
Simpler/faster solution would be to insert a piece of paper in between all the pages to be scanned...
Wouldn't that defeat the purpose of using the MRI to begin with? Inserting ONE sheet of paper between EVERY page in a book doesn't seem like it would take much more effort than flipping the page and photographing it.
Re: (Score:3, Interesting)
As for finding the boundaries between books in a stack, if
Re: (Score:2, Funny)
which means that if you can't do it now, you'll probably be able to do it on your cell phone in 10 years.
I can't solve the TSP for 1000 cities on my desktop computer today, but I suspect in 10 years time I'll be able to solve it on my mobile phone.
Re: (Score:2)
Maybe not. If you do something like a spiral CT, perhaps at multiple angles, you might be able to build a 3d volumetric model based solely on the statistical interpretation of the data points. Along a given ray you know the ink density, and if there are enough rays you could figure out the real possible solutions.
Re: (Score:1, Insightful)
just filed vague patent for that
Re: (Score:3, Funny)
Brilliant idea!
Make it so!
Re: (Score:2)
you could probably do this with xeroradiography, just set the power setting to high, and come up with a system that allows you to focus accurately per page. xeroradiography uses much simpler and readily avalible processes/materials than a modern MRI. just depends on the density of the ink they use relative to the paper.
Re: (Score:2)
Librarian Chantey (Score:5, Informative)
Sea Shanties [wikipedia.org] were sung in association with ship-board tasks (often repetitious in nature). Is Google paving the way for the Librarian chantey?
Re:Librarian Chantey (Score:5, Funny)
Haul away, haul away!
Well they read their stories on robotic Nook®s
and we're bound away for Australia!
Re: (Score:1)
so were forever scanning books
Sound off
p867, p868
Sound off
p869, p870
Not too surprising (Score:2)
Looks kinda like this guys machine:
http://hardware.slashdot.org/story/09/12/13/1747201/The-DIY-Book-Scanner?art_pos=3 [slashdot.org]
summary of summary. (Score:5, Funny)
human type book into PC, machine print book on paper, machine binds book ---time goes by--- machine unbind book, robot and human flip pages of book, machine photograph book, machine put book on PC.
Re: (Score:2)
The farmer in the dell, the farmer in the dell... hi-ho the merry-o, the farmer in the dell.
Build your own.... (Score:5, Interesting)
Simply set up a rig with 2 digital cameras and a plexiglass V to photograph 2 pages at a time. It's quite fast and cheap.
http://www.diybookscanner.org/ [diybookscanner.org]
Works great. I built one to turn a couple of rare automotive books into PDF so I dont damage a $180.00 book in the garage.
Re: (Score:2, Funny)
I built one to turn a couple of rare automotive books into PDF so I dont damage a $180.00 book in the garage.
Great, so now you can damage a $1800 laptop instead?
Re: (Score:3, Informative)
Or damage some cheap 8.5x11 that you print out the relevant pages on.
Re:Build your own.... (Score:5, Interesting)
What idiot would use a $1800 laptop in the garage to view a PDF?
Let me guess, you change your oil wearing a cashmere sweater and silk shirts as well.
Nope. I risk my $40.00 fujitsu tablet PC that views pdf's just fine but has not enough Horsepower to do much else. works awesome as a garage PC to read PDF's and read the engine codes with my RS232-ODBII scanner/logger.
Re: (Score:2)
> my $40.00 fujitsu tablet PC
I'll call you on that.
I just checked and Fujitsu's cheapest tablet PC, the T4310, is $1,149 USD.
So please indicate where I can purchase a $40 tablet PC that can read PDFs.
Re: (Score:2)
www.ebay.com.... I got a Used stylistic 3500.
Are you one of those wierd people that must have everything new?
Re: (Score:2)
Judging by ebay's completed sales, it's probably worth two or three times that, if you factor in shipping.
Re: (Score:2)
I got a lot of 3 a year ago and they averaged to $40.00 each. So Sorry you are unable to find the deals I am able to on ebay.
But then I typically find most of my items to buy on ebay at far FAR less than the "average" sale price.
Re: (Score:2)
Fujitsu's cheapest new tablet PC may be $1,149. Why would he use a brand-new machine?
Here's something much closer to his price [ebay.com], and ought to be more than capable of viewing PDFs.
Re: (Score:2)
I agree with your point that for this use going used will be much cheaper than new.
But $189 is still a lot more than $40.
Re: (Score:2)
Ebay? Certainly one wouldn't use a brand spankmenew laptop for something like this.
Re: (Score:2)
Clearly it's an old secondhand laptop, you pedant.
Re: (Score:1)
I built one to turn a couple of rare automotive books into PDF so I dont damage a $180.00 book in the garage.
Great, so now you can damage a $1800 laptop instead?
Since when do you need a new $1800 laptop to view PDFs? There's always old used laptops and even cheap new ones.
Re: (Score:3, Funny)
Maybe he is running Acrobat Reader in Vista
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
We'll put it in the second edition.
In an appendix.
Near the back.
Missing the point? (Score:5, Insightful)
Google is secretive, but the system's inner workings were apparently divulged by University of Tokyo researchers
Surely the whole point of the patent system is to grant exclusive use for a period in return for publishing full details of how whatever it is works? How can you have a patent without divulging the crucial information?
Re:Missing the point? (Score:4, Insightful)
Re: (Score:3, Informative)
While you may be correct in certain circumstances, your wording gives a false impression that this always works. You must disclose the best mode [uspto.gov] when filing a patent application.
Re: (Score:2)
How can you have a patent without divulging the crucial information?
By not playing fair. I'm sure there are lots of companies who deal in IP who can teach you about doing this. In particular, I hear the music, film and software industry players are good at this.
Re:Video of book scanner... (Score:4, Informative)
Elegant, hypnotic, and not what google uses. Google scans the books, lying flat. It projects a grid-like pattern over the pages in IR, photographs up the distorted image using 3D cameras, and recreates a 3D model of the book, and uses that model to undistort the pages. It uses human slaves to turn the pages, since robots aren't as gentle.
The link isn't slashdotted anymore [nyud.net]
Re: (Score:2)
It uses human slaves to turn the pages, since robots aren't as gentle.
Hmm... if I were google, I would use suction power to flip the pages... take a pipe, drill in some holes, and attach a vacuum cleaner... then attach the pipe to some robot arm.
Re: (Score:2)
take a pipe, drill in some holes, and attach a vacuum cleaner... then attach the pipe to some robot arm.
Pervert! Leave our fine robot women alone.
Re: (Score:2)
Re: (Score:2)
And then, to top it all off, they present to us an unreasonably large book that takes it from the scanner.
Large books are some of the hardest to scan-- too many curves. They're also among the best candidates for digital storage because they take up so much space.
Musical page flipping. (Score:5, Funny)
...and information about how Google wants to use music to help humans flip pages.
...you will know it is time to turn the page when Tinkerbell rings her little bells like this...
Yaz.
Executive summary (Score:3, Funny)
Google's book scanning technology? Two guys and an Epson V500.
Executive summary of WWII (Score:1)
The United States invaded half of Europe.
Re: (Score:2)
The United States invaded half of Europe.
... then split it with the Russians.
Re: (Score:1)
So the net difference between 1940 and 1945: Effective Russian border moves a few hundred miles West. Oh, and lots of dead people.
Lemme guess... (Score:3, Funny)
It involves pigeons, doesn't it?
We used to call them "Service Bureaus" (Score:5, Interesting)
Back when we called them "Service Bureaus" book scanning was fast, easy, and cheap, as long as you didn't want the book back.
You deliver your book, magazine, phone book, map, large format document, or whatever to a Service Bureau.
They will then use a paper saw and cut the binding off and the other three sides to make perfectly smooth edges.
Then they put the whole mess into a hopper. The hopper feeds the pages to a scanner.
When it's done, flip the pile over and put it back into the hopper to get the odd-numbered pages into the scanner.
What you get back is your original book (as a pile of pages with no binding) and a CD-ROM of its contents in both original TIFF and OCRd text files. Now you can get them as PDF/A and DejaVu formats.
I suppose Google's point is that they don't want to ruin the books, or maybe they are so proud of their 3D-scanner enough to use it at all costs. But think of this: there are usually several thousands, perhaps millions, of copies of the books I've seen in Google's library, so destroying one copy of the book seems fair enough.
Re: (Score:2)
Re: (Score:2)
I think this technology was developed to scan rare books. the kind you cant destroy you know ?
Re: (Score:2)
Damn kids, get off my Syquest cart!
Re:We used to call them "Service Bureaus" (Score:4, Informative)
There are ALOT of books out there which would NOT be suitable for this method. A friend of mine in University for Museum Studies often has to read these books which are incredibly old. I believe the University has a couple that date somewhere around the 1830's which is older than the books you find in the historical village we have in town.
Yes, the university lets you read books that are old enough to belong in a museum. She showed me one of them one time. It was like a manuscript, Thick leather binding, nothing written on the front, heavy faded pages. I almost couldn't believe it.
Sadly, that was the most exciting part of it. The writing was dryer than a desert, and it was on some subject that I had zero interest in. They are supposedly starting to go ALL digital, so I have no idea what they're going to do with those old books and mansucripts they've got sitting around.
I hope they don't destroy them.
Re: (Score:2)
There's no reason you couldn't do this with a book from the 1830s; on http://words.fromoldbooks.org/ [fromoldbooks.org] I have text from 18th century books that have been scanned like this, and on http://www.fromoldbooks.org/ [fromoldbooks.org] some considerably older books.
IIt turns out that there are interesting old books too, you'll be pleased to know, although the futher back you go, the more likely you are to find a book in Latin. Well, until you get far enough back that scrolls are common, and then Greek and Hebrew/Aramaic are common :D
In
Re: (Score:3, Interesting)
That's insufficiently destructive.
They should use the method from Vernor Vinge's novel "Rainbow's End", where the books are fed into what is essentially a giant chipper/shredder. The shredded pages are then blown through a tunnel studded with cameras, swirled around so that every side of every piece of paper is photographed at some point, and then all of the images are reassembled to form complete images of every page. At the end of the tunnel is an incinerator which burns the shredded paper.
The books
Re: (Score:2)
I recall that they kept the "shredda", sterilized and sealed in helium canisters buried deep underground. This was their argument that the shredding-process is actually better for preservation, since the text is rot-free and could in principle be reconstructed later.
Or at least that's what they claim. Is it revealed somewhere that they actually were burning it? I don't remember.
It's a neat idea though and not like you need to destroy every copy of the book anyways. Unique or rare books could be scanned page
Re: (Score:2)
Not "music" -- just a "tone" (Score:1)
There are also videos of robotic page flippers and information about how Google wants to use music to help humans flip pages.
From TFA:
The patent describes how a musical tone can be played from the speakers at regular intervals to give the operator a pace to flip pages to.
Not sure what this means, but what is the difference between a "musical tone" and a "tone"? Probably none, except a pleasant timbre with a pitch. From the description, it likely just means a pleasant-sounding "beep" or "ding" or something that recurs at intervals so people know when it's safe to turn the page.
In any case, hardly "using music" to encourage page-flipping -- which brings up weird images of people "Sweatin' to the Oldies" while turning pages for Google.
manga... (Score:2)
Short Circuit (Score:2)
Re: (Score:3, Funny)
Secret Patent (Score:2)
Doesn't patent law require Google to disclose the invention in order to get it protected? I mean, I only have a vague idea of how it works, but I thought this was one of the points.
in this economy (Score:1)
I dont care anymore (Score:2)
I was very closely following this project having know the project team lead and talked to him about different projects he had going for the library deal. I remember 8 years ago talking to him about how he was accomplishing the scanning part of it, he told me they even created their own scanning software.
Today I saw the coolest little gadget that some homebrew tinkerer made, covered on /. a month ago, don't have link sorry....
and he used 2 cheap cameras and a big ass metal frame meant to keep the book open a
Page flipping is hard (Score:2)
It's surprisingly hard to automate page-turning. I saw the first page-turning machine many years ago, at the Census Bureau. It was used for 1970 Census form booklets, and used a vacuum belt to hold the booklet down while a wheel with vacuum holes rolled over the page to turn the page. This only worked for booklets with known dimensions, and it was rather rough on the booklets. But it was fast, doing about two flips a second.
It's such a boring job for humans that they screw up. A hand appears in the