Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Books Google Technology

Google's Book Scanning Technology Revealed 100

blee37 writes "Last March we discussed Google's patent for a rapid book scanning system. This article describes and provides pictures of how the system works in practice. Google is secretive, but the system's inner workings were apparently divulged by University of Tokyo researchers who wrote a research article on essentially identical technology. There are also videos of robotic page flippers and information about how Google wants to use music to help humans flip pages."
This discussion has been archived. No new comments can be posted.

Google's Book Scanning Technology Revealed

Comments Filter:
  • Now my PC (Score:5, Funny)

    by bugs2squash ( 1132591 ) on Thursday January 07, 2010 @02:33PM (#30686328)
    Can RTFA for me
  • MRI technology? (Score:5, Interesting)

    by maillemaker ( 924053 ) on Thursday January 07, 2010 @02:37PM (#30686372)

    I often wondered if it would be possible for a book to be scanned while closed, using some kind of MRI technology that digitally sliced the book page by page, picking up on the density difference between the ink and the paper slice by slice.

    • Re: (Score:3, Insightful)

      Comment removed based on user account deletion
      • No, but a book is not 6 feet tall. The issue is, is the MRI, slow as it is, faster than flipping each page manually 300 times and taking individual scans? I agree the tech isn't there yet, but it isn't inherently impossible.

        • Plus, I could envision a system where you loaded many books into a cartridge of sorts, say about 6 feet long, with a divider of some kind placed between each book.

          As the scanner worked its way down the cartridge, it could detect the dividers, which would delineate one book from the next.

          Thus even if the scanner were slow, perhaps it could scan say 50 books in one pass.

          Steve

        • Even if it was about the same speed...cost is a factor as well as speed...

          You could probably set up an extra google book scanner and pay someone to staff it round the clock for about the same price as taking an MRI of a book with some special MRI setup.

        • No, but a book is not 6 feet tall.

          The speed of an MRI is proportional to the resolution you wan to have.
          More voxels = More time.

          The issue is, is the MRI, slow as it is, faster than flipping each page manually 300 times and taking individual scans?

          In the Z dimension, you'll need an insanely extreme resolution to be able to tell apart 300 pages. And on each pages, you need a resolution high enough so the pages are clearly visible and not blurry.

          For the record, a hidef anatomy-research-grade MRI scan of a brain has only 256x256x200 voxels and can take half an hour. And for that we used special high-speed techniques (3d mprage), which restrict you to 200 voxels

      • MRI machine is only going to need one pass around the book. The rest of the work is the data processing. Costs would have to come down *drastically* though for this to be feasible on a large scale. Probably still much cheaper to cut the bindings off books and run them through a high-speed scanner.
        • Re: (Score:3, Interesting)

          by Thud457 ( 234763 )
          yeah, it'd really suck if Google applied their sizable brainage to solving a problem that would have making MRI's cheap and fast as a side-effect. totally suck.


          I haven't verified this, but my father-in-law told me the guy that invented the MRI wanted to develop it as a medical scanner to the point where it was cheap enough that everybody could afford it. Then GE et al locked up the idea and turned it into a profit center.
          • I'm not saying that wouldn't be great, but based on their patent, I'm fairly certain they believe they've solved the problem for now. You know how much helium costs, especially the amount you need for an MRI machine?
            • Re: (Score:1, Troll)

              by fizzup ( 788545 )

              You know how much helium costs, especially the amount you need for an MRI machine?

              Yes. Liquid helium in bulk costs about as much as Coca Cola from a vending machine.

              • Re: (Score:3, Interesting)

                by TooMuchToDo ( 882796 )
                http://nanoscale.blogspot.com/2007/09/secret-joys-of-running-lab-helium.html [blogspot.com]

                The downside of liquid helium is that it's damned expensive, and getting more so by the minute. Running at full capacity I could blow through several thousand liters in a year, and at several dollars a liter minimum plus overhead, that's real money. As a bonus, lately our supplier of helium has become incredibly unreliable, missing orders and generally flaking out, while simultaneously raising prices because of actual production shortages. I just had to read the sales guy the riot act, and if service doesn't improve darn fast, we'll take our business elsewhere, as will the other users on campus. (Helium comes from the radioactive decay of uranium and other alpha emitters deep in the earth, and comes out of natural gas wells.) The long-term solutions are (a) set up as many cryogen-free systems as possible, and (b) get a helium liquifier to recycle the helium that we do use. Unfortunately, (a) requires an upfront cost comparable to about 8 years of a system's helium consumption per system, and (b) also necessitates big capital expenses as well as an ongoing maintenance issue. Of course none of these kinds of costs are the sort of thing that it's easy to convince a funding agency to support. Too boring and pedestrian.

                By the way, I spend most of my days on site at the largest US particle accelerator. Let me know if you'd like to chat with the cryo dept. about how much the tankers of liqiud helium cost ;)

          • Re: (Score:3, Interesting)

            by guruevi ( 827432 )

            I never heard the story but you might be confusing MRI with X-Ray machines. You might also remember the stories about X-Rays in shoe stores and why that wasn't a good idea.

            But either way, the costs are not unrealistically high, you can pick up a used MRI machine for about a 100k. GE doesn't have a monopoly on MRI's, Siemens, Hitachi and a few others make them as well. The simple physics alone however would not allow an MRI machine for most people though. The magnets involved are just too strong that they be

            • I used to work for the largest manufacturer of MRIs - and it wasn't GE. Not even close. Try looking at Philips Medical Systems for the number of units produced per year vs everyone else. Before Philips bought my old employer, they were its largest customer, but that supplier was also the one making the Hitachi systems.
    • Re:MRI technology? (Score:5, Insightful)

      by SnarfQuest ( 469614 ) on Thursday January 07, 2010 @03:04PM (#30686746)

      When the book is closed, the ink from facing pages will be mashed together, shouol you will need to be able to tell which page the ink is attached to. Since the ink mostly sits on top of the paper (if it soaks through you wouldn't be able to read the other side veery well) it is a very thin layer. Your scanning technology would need to be able to sense very small volumes of ink. I don't think we are anywhere close to the necessary precision yet.

      • MRIs have resolution down to 90nm.

        Simpler/faster solution would be to insert a piece of paper in between all the pages to be scanned. Then do the MRI. If the OCR turns up 0 hits, mirror the page and run it through again.

        Or make recaptcha users keep a mirror at their desk.

        • Re: (Score:1, Insightful)

          by Anonymous Coward

          If you have to insert a piece of paper in between each page, wouldn't it be easier just to image the pages while they're open?

        • MRIs have resolution down to 90nm.

          In depth? Even if that's so, it's considerably less than the error caused by crinkling and curvature of the pages.

        • Re: (Score:2, Insightful)

          by Daley_G ( 1592515 )

          MRIs have resolution down to 90nm.

          Simpler/faster solution would be to insert a piece of paper in between all the pages to be scanned...

          Wouldn't that defeat the purpose of using the MRI to begin with? Inserting ONE sheet of paper between EVERY page in a book doesn't seem like it would take much more effort than flipping the page and photographing it.

      • Re: (Score:3, Interesting)

        by trb ( 8509 )
        The patterns generated by 2 pages of text superimposed on each other (with one set in mirror image) are not impossible to read. Take a two-sided page and hold it up to the light and try to read it. It may seem difficult, the symbols may be fully or partially superimposed, but it's not impossible. It may be solvable with sufficient computes, which means that if you can't do it now, you'll probably be able to do it on your cell phone in 10 years.

        As for finding the boundaries between books in a stack, if

        • Re: (Score:2, Funny)

          by cdfh ( 1323079 )

          which means that if you can't do it now, you'll probably be able to do it on your cell phone in 10 years.

          I can't solve the TSP for 1000 cities on my desktop computer today, but I suspect in 10 years time I'll be able to solve it on my mobile phone.

      • Your scanning technology would need to be able to sense very small volumes of ink.

        Maybe not. If you do something like a spiral CT, perhaps at multiple angles, you might be able to build a 3d volumetric model based solely on the statistical interpretation of the data points. Along a given ray you know the ink density, and if there are enough rays you could figure out the real possible solutions.

    • Re: (Score:1, Insightful)

      by Anonymous Coward

      just filed vague patent for that

    • Re: (Score:3, Funny)

      by Snaller ( 147050 )

      Brilliant idea!

      Make it so!

    • by Hadlock ( 143607 )

      you could probably do this with xeroradiography, just set the power setting to high, and come up with a system that allows you to focus accurately per page. xeroradiography uses much simpler and readily avalible processes/materials than a modern MRI. just depends on the density of the ink they use relative to the paper.

    • by EdZ ( 755139 )
      You've been reading Inherit The Stars, haven't you.
  • Librarian Chantey (Score:5, Informative)

    by _Sprocket_ ( 42527 ) on Thursday January 07, 2010 @02:38PM (#30686388)

    Sea Shanties [wikipedia.org] were sung in association with ship-board tasks (often repetitious in nature). Is Google paving the way for the Librarian chantey?

  • by Anonymous Coward on Thursday January 07, 2010 @02:42PM (#30686432)

    human type book into PC, machine print book on paper, machine binds book ---time goes by--- machine unbind book, robot and human flip pages of book, machine photograph book, machine put book on PC.

  • Build your own.... (Score:5, Interesting)

    by Lumpy ( 12016 ) on Thursday January 07, 2010 @02:44PM (#30686458) Homepage

    Simply set up a rig with 2 digital cameras and a plexiglass V to photograph 2 pages at a time. It's quite fast and cheap.

    http://www.diybookscanner.org/ [diybookscanner.org]

    Works great. I built one to turn a couple of rare automotive books into PDF so I dont damage a $180.00 book in the garage.

    • Re: (Score:2, Funny)

      by Malard ( 970795 )

      I built one to turn a couple of rare automotive books into PDF so I dont damage a $180.00 book in the garage.

      Great, so now you can damage a $1800 laptop instead?

      • Re: (Score:3, Informative)

        Or damage some cheap 8.5x11 that you print out the relevant pages on.

      • by Lumpy ( 12016 ) on Thursday January 07, 2010 @03:30PM (#30687078) Homepage

        What idiot would use a $1800 laptop in the garage to view a PDF?

        Let me guess, you change your oil wearing a cashmere sweater and silk shirts as well.

        Nope. I risk my $40.00 fujitsu tablet PC that views pdf's just fine but has not enough Horsepower to do much else. works awesome as a garage PC to read PDF's and read the engine codes with my RS232-ODBII scanner/logger.

        • by hoggoth ( 414195 )

          > my $40.00 fujitsu tablet PC

          I'll call you on that.
          I just checked and Fujitsu's cheapest tablet PC, the T4310, is $1,149 USD.

          So please indicate where I can purchase a $40 tablet PC that can read PDFs.

          • by Lumpy ( 12016 )

            www.ebay.com.... I got a Used stylistic 3500.

            Are you one of those wierd people that must have everything new?

            • Judging by ebay's completed sales, it's probably worth two or three times that, if you factor in shipping.

              • by Lumpy ( 12016 )

                I got a lot of 3 a year ago and they averaged to $40.00 each. So Sorry you are unable to find the deals I am able to on ebay.

                But then I typically find most of my items to buy on ebay at far FAR less than the "average" sale price.

          • Fujitsu's cheapest new tablet PC may be $1,149. Why would he use a brand-new machine?

            Here's something much closer to his price [ebay.com], and ought to be more than capable of viewing PDFs.

            • by hoggoth ( 414195 )

              I agree with your point that for this use going used will be much cheaper than new.

              But $189 is still a lot more than $40.

          • by nizo ( 81281 ) *

            Ebay? Certainly one wouldn't use a brand spankmenew laptop for something like this.

          • Clearly it's an old secondhand laptop, you pedant.

      • I built one to turn a couple of rare automotive books into PDF so I dont damage a $180.00 book in the garage.

        Great, so now you can damage a $1800 laptop instead?

        Since when do you need a new $1800 laptop to view PDFs? There's always old used laptops and even cheap new ones.

  • Missing the point? (Score:5, Insightful)

    by johnw ( 3725 ) on Thursday January 07, 2010 @02:53PM (#30686598)

    Google is secretive, but the system's inner workings were apparently divulged by University of Tokyo researchers

    Surely the whole point of the patent system is to grant exclusive use for a period in return for publishing full details of how whatever it is works? How can you have a patent without divulging the crucial information?

    • by pclminion ( 145572 ) on Thursday January 07, 2010 @03:45PM (#30687278)
      I work for a company with a lot of patents. Our products are protected partially by patents and partially by trade secret information. In other words, to recreate our product you would need to license the patents AND figure out how we did the other stuff, that is NOT patented, but is secret. There's no reason you can't mix patented and trade secret technology in one solution.
      • Re: (Score:3, Informative)

        by zavyman ( 32136 )

        While you may be correct in certain circumstances, your wording gives a false impression that this always works. You must disclose the best mode [uspto.gov] when filing a patent application.

        The specification . . . shall set forth the best mode contemplated by the inventor of carrying out his invention.

        "The best mode requirement creates a statutory bargained-for-exchange by which a patentee obtains the right to exclude others from practicing the claimed invention for a certain time period, and the public receives knowl

    • How can you have a patent without divulging the crucial information?

      By not playing fair. I'm sure there are lots of companies who deal in IP who can teach you about doing this. In particular, I hear the music, film and software industry players are good at this.

  • by Yaztromo ( 655250 ) on Thursday January 07, 2010 @03:09PM (#30686804) Homepage Journal

    ...and information about how Google wants to use music to help humans flip pages.

    ...you will know it is time to turn the page when Tinkerbell rings her little bells like this...

    Yaz.

  • by 93 Escort Wagon ( 326346 ) on Thursday January 07, 2010 @03:15PM (#30686886)

    Google's book scanning technology? Two guys and an Epson V500.

  • by Chris Burke ( 6130 ) on Thursday January 07, 2010 @03:29PM (#30687060) Homepage

    It involves pigeons, doesn't it?

  • by kriston ( 7886 ) on Thursday January 07, 2010 @03:30PM (#30687068) Homepage Journal

    Back when we called them "Service Bureaus" book scanning was fast, easy, and cheap, as long as you didn't want the book back.

    You deliver your book, magazine, phone book, map, large format document, or whatever to a Service Bureau.
    They will then use a paper saw and cut the binding off and the other three sides to make perfectly smooth edges.
    Then they put the whole mess into a hopper. The hopper feeds the pages to a scanner.
    When it's done, flip the pile over and put it back into the hopper to get the odd-numbered pages into the scanner.

    What you get back is your original book (as a pile of pages with no binding) and a CD-ROM of its contents in both original TIFF and OCRd text files. Now you can get them as PDF/A and DejaVu formats.

    I suppose Google's point is that they don't want to ruin the books, or maybe they are so proud of their 3D-scanner enough to use it at all costs. But think of this: there are usually several thousands, perhaps millions, of copies of the books I've seen in Google's library, so destroying one copy of the book seems fair enough.

    • This method is for books in academic libraries where it isn't feasible to saw the binding and use a traditional scanner.
    • I think this technology was developed to scan rare books. the kind you cant destroy you know ?

    • Back when we called them "Service Bureaus"

      Damn kids, get off my Syquest cart!

    • by Monkeedude1212 ( 1560403 ) on Thursday January 07, 2010 @04:04PM (#30687492) Journal

      There are ALOT of books out there which would NOT be suitable for this method. A friend of mine in University for Museum Studies often has to read these books which are incredibly old. I believe the University has a couple that date somewhere around the 1830's which is older than the books you find in the historical village we have in town.

      Yes, the university lets you read books that are old enough to belong in a museum. She showed me one of them one time. It was like a manuscript, Thick leather binding, nothing written on the front, heavy faded pages. I almost couldn't believe it.

      Sadly, that was the most exciting part of it. The writing was dryer than a desert, and it was on some subject that I had zero interest in. They are supposedly starting to go ALL digital, so I have no idea what they're going to do with those old books and mansucripts they've got sitting around.

      I hope they don't destroy them.

      • by Ankh ( 19084 )

        There's no reason you couldn't do this with a book from the 1830s; on http://words.fromoldbooks.org/ [fromoldbooks.org] I have text from 18th century books that have been scanned like this, and on http://www.fromoldbooks.org/ [fromoldbooks.org] some considerably older books.

        IIt turns out that there are interesting old books too, you'll be pleased to know, although the futher back you go, the more likely you are to find a book in Latin. Well, until you get far enough back that scrolls are common, and then Greek and Hebrew/Aramaic are common :D

        In

    • Re: (Score:3, Interesting)

      by swillden ( 191260 )

      That's insufficiently destructive.

      They should use the method from Vernor Vinge's novel "Rainbow's End", where the books are fed into what is essentially a giant chipper/shredder. The shredded pages are then blown through a tunnel studded with cameras, swirled around so that every side of every piece of paper is photographed at some point, and then all of the images are reassembled to form complete images of every page. At the end of the tunnel is an incinerator which burns the shredded paper.

      The books

      • I recall that they kept the "shredda", sterilized and sealed in helium canisters buried deep underground. This was their argument that the shredding-process is actually better for preservation, since the text is rot-free and could in principle be reconstructed later.

        Or at least that's what they claim. Is it revealed somewhere that they actually were burning it? I don't remember.

        It's a neat idea though and not like you need to destroy every copy of the book anyways. Unique or rare books could be scanned page

        • You may be right about the shredda. I don't remember for sure, and don't have the book handy.
  • There are also videos of robotic page flippers and information about how Google wants to use music to help humans flip pages.

    From TFA:

    The patent describes how a musical tone can be played from the speakers at regular intervals to give the operator a pace to flip pages to.

    Not sure what this means, but what is the difference between a "musical tone" and a "tone"? Probably none, except a pleasant timbre with a pitch. From the description, it likely just means a pleasant-sounding "beep" or "ding" or something that recurs at intervals so people know when it's safe to turn the page.

    In any case, hardly "using music" to encourage page-flipping -- which brings up weird images of people "Sweatin' to the Oldies" while turning pages for Google.

  • The University of Toyko's version is demo'd using a manga... go figure. The high-speed camera approach is also really cool. Reminds me of that TNG episode (yeah yeah, I know) where the aliens built that casino/hotel based on a book for that astronaut... Picard hands the novel over to Data and asks him to summarize. Data just flips through all the pages in like 3 seconds and spews out the madness.
  • Johnny Five had no problem flipping pages and scanning them back in 1986. I don't see what the big deal is here.
  • Doesn't patent law require Google to disclose the invention in order to get it protected? I mean, I only have a vague idea of how it works, but I thought this was one of the points.

  • I am sure that in this worldwide depression, Google can easily find people willing to carefully place and turn books for $1/day. Sugar cane farmers in S. America work for $1/day. I would think being a book scanner would be a highly sought after position. Si Senor, the room has AC to keep the books comfortable?
  • I was very closely following this project having know the project team lead and talked to him about different projects he had going for the library deal. I remember 8 years ago talking to him about how he was accomplishing the scanning part of it, he told me they even created their own scanning software.

    Today I saw the coolest little gadget that some homebrew tinkerer made, covered on /. a month ago, don't have link sorry....
    and he used 2 cheap cameras and a big ass metal frame meant to keep the book open a

  • It's surprisingly hard to automate page-turning. I saw the first page-turning machine many years ago, at the Census Bureau. It was used for 1970 Census form booklets, and used a vacuum belt to hold the booklet down while a wheel with vacuum holes rolled over the page to turn the page. This only worked for booklets with known dimensions, and it was rather rough on the booklets. But it was fast, doing about two flips a second.

    It's such a boring job for humans that they screw up. A hand appears in the

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.

Working...