Google Engineers Open Source Book Scanner Design

Google Engineers Open Source Book Scanner Design 69

Posted by timothy on Thursday November 15, 2012 @10:29AM from the your-book-scanner-sucks dept.

c0lo writes "Engineers from Google's Books team have released the design plans for a comparatively reasonably priced (about $1500) book scanner on Google Code. Built using a scanner, a vacuum cleaner and various other components, the Linear Book Scanner was developed by engineers during the '20 percent time' that Google allocates for personal projects. The license is highly permissive, thus it's possible the design and building costs can be improved. Any takers?" Adds reader leighklotz: "The Google Tech Talk Video starts with Jeff Breidenbach of the Google Books team, and moves on to Dany Qumsiyeh showing how simple his design is to build. Could it be that the Google Books team has had enough of destroying the library in order to save it? Or maybe the just want to up-stage the Internet Archive's Scanning Robot. Disclaimer: I worked with Jeff when we were at Xerox (where he did this awesome hack), but this is more awesome because it saves books."

Google Engineers Open Source Book Scanner Design

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 69 Comments Log In/Create an Account

Comments Filter:

- Re: (Score:2)
  
  by lord_rob the only on ( 859100 ) writes:
  
  /me thinks you weren't really thinking of the book scanner design when you made that comment ;-).
  Note that there's absolutely no relation between Book Scanners and Phone Design.
  - Re: (Score:2)
    
    by jones_supa ( 887896 ) writes:
    
    They only have to add some feature that stores your scans in the cloud.
    - Re: (Score:2)
      
      by mug funky ( 910186 ) writes:
      
      it's an open design, dumbass. save yourself some time and don't build the spying part.
False economy (Score:5, Insightful)

by srussia ( 884021 ) writes: on Thursday November 15, 2012 @10:37AM (#41991253)

FTFA: For the past eight years, Google has been working on digitizing the worldâ(TM)s 130 million or so unique books.

If these books are truly unique, you're taking a big risk subjecting them to this contraption.

- Re:False economy (Score:5, Funny)
  
  by vlm ( 69642 ) writes: on Thursday November 15, 2012 @10:44AM (#41991323)
  
  The proper SQL statement would have been "DISTINCT" not a "UNIQUE" index, true.
  
- Re: (Score:2)
  
  by Hatta ( 162192 ) writes:
  
  You're taking a bigger risk not subjecting them to this contraption.
- Re: (Score:3)
  
  by plover ( 150551 ) writes:
  
  He addresses that in the talk. Yes, this machine can fold or tear pages. But they talked to an archivist, and he said that scanning the books in this machine was less risky than not scanning them at all. If they're scanned, the information is preserved, backed up, spread around, and is then widely available. Any library book is subject to risk from the patron tearing or damaging the book, yet they still accept the risk of making them available.
  Besides, how much worse is the risk of possibly tearing a pa
- Re: (Score:2)
  
  by ryzvonusef ( 1151717 ) writes:
  
  Frankly, I like the idea presented by these guys better:
  http://www.diybookscanner.org/ [diybookscanner.org]
  The have the book lying down on it's spine and supported in a nice 45-ish angle that prevents too much of a tear. However they use ordinary cameras instead of the scanning tech used in a...well...scanner. Though I believe cameras tend to work faster than a scanner, so I don't see a downside.
  - Re: (Score:2)
    
    by plover ( 150551 ) writes:
    
    The Google guy mentioned them in the presentation. The primary drawback to the other DIY scanners is manual operation. Setup involves adjusting the lights, the cameras, and the hinge point for the platen; not a big deal. But in operation, the human has to lift the platen, flip the page, set the platen down, trigger the cameras, and then repeat for each page. My understanding is that a person can scan a 500 page book in about 20-30 minutes, so it's of a comparable speed to this new page-turning scanner.
    - Re: (Score:2)
      
      by ryzvonusef ( 1151717 ) writes:
      
      I agree with your points, and I saw the video, but I was actually referring to the OP's point about handling delicate books.
      DIY's system has the book (which is in fragile condition) down, and very properly secured, and the scanning apparatus (which is more able to take the stress from the constant movement) is the one that moves.
      I was actually imagining my dad's big-ass collins dictionary from *his* college time, and comparing the state of that to what I might expect the usual state of affair will be of the
      - Re: (Score:2)
        
        by plover ( 150551 ) writes:
        
        OK, I get what you're saying now. You want to take the mass of the book out of the equation, so that a fragile spine wouldn't be further damaged or even torn in two by the weight of a heavy book straddling a sharp edge, and all the motion of this mechanism. And I agree.
        It looks like the high end commercial book scanners are constructed to take that into account too, where the weight of the book is supported by the covers in a cradle, just like the DIY scanners. They use a vacuum mechanism to draw a singl
        
        Re: (Score:2)
        
        by ryzvonusef ( 1151717 ) writes:
        
        Agree with you.
        On further thought, I think it would be better if the book stayed still and it was the *scanner* that moved back and forth ( in the scanner-top position I described earlier)
        That should eliminate worries about size and weight, since the only weight in question is the scanner itself, rather than the book, and that will remain constant.
        Also, I think, errors could be reduced by *slowing* down the process, to further minimise pages caught/stuck/torn, since slower and steadier push will allow for m
Very Good Wiki Direction (Score:5, Interesting)

by retroworks ( 652802 ) writes: on Thursday November 15, 2012 @10:39AM (#41991283) Homepage Journal

I work in the tech recycling business, but we get literally hundreds of tons of books turned in for recycling. It pains me to see most of them go to paper recycling recovery, though there is a growing market for shops that scan barcodes for resale. I would think that Google would have problems with copyright law, as would any single entity who is at risk of scanning the wrong book (i.e. the one someone would take time to sue you for, especially if you have deep google-pockets). This direction opens to small scale "wiki-scanning", which could be really ideal since people who have actually read the book would probably be the best ones to figure out if was worth the time to scan, would tend to prioritize important books (preserving them) and would present a very decentralized system for lawsuits. If I can scan the book for "personal use" like the cassette tape rulings for music, all the better. The problem is the physical space these books take, and its causing a lot of out of print books to get made into cereal boxboard, and the scale at which 50-100 year old out of print books are getting recycled is scary.

- Re: (Score:2)
  
  by Sqr(twg) ( 2126054 ) writes:
  
  If you're scanning to save physical space, you don't need this contraption. Just cut off the back of the book and put the pages in a regular scanner with a sheet feeder. (You can get an excellent one for about $400, including OCR software.)
  - Re: (Score:2)
    
    by nbauman ( 624611 ) writes:
    
    What's the best way to cut off the back?
    - Re: (Score:1)
      
      by charlesj68 ( 1170655 ) writes:
      
      A band saw works really well.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
Harvesting knowledge in case of society collapse (Score:4, Insightful)

by concealment ( 2447304 ) writes: on Thursday November 15, 2012 @10:45AM (#41991333) Homepage Journal

We know it can happen. Rome fell, Greece fell, Angkor Wat fell, Easter Island collapsed. Societies die just like we do.
It would be a shame to lose all of the knowledge, art, and literature that we have accumulated during our tenure so far.
Scanning books is a good way to archive much of that information for the next society that can develop digital computing. I suggest we enshrine it all in orbit or on the moon, guaranteeing it relative immortality and making it accessible only to those technologically advanced enough to benefit from it.
For all we know, the ancient Khmer civilization at Ankgor Wat [about.com] invented advanced technology, and it's just lost merely to time.
We owe it to future generations to make sure our society does not lose as much when it collapses.

- Re:Harvesting knowledge in case of society collaps (Score:5, Insightful)
  
  by bickerdyke ( 670000 ) writes: on Thursday November 15, 2012 @11:02AM (#41991481)
  
  But stone & clay slabs of the Sumerians and papyrus of the Egyptians survived until today, but the original data feed of the Apollo missions are lost forever because they were thrashed when no one had the equipment to read the old data tapes.
  
  - Re: (Score:2)
    
    by PybusJ ( 30549 ) writes:
    
    I'm not sure you're making a valid comparison. If I choose any particular piece of Egyptian recorded information then there's a good chance that it is destroyed. The fact that some material survived several millennia is both impressive and interesting, but very much material survives from the 60s even if some has been lost.
    I mean how many records of the ancient Egyptian space race survive to this day? I rest my case.
    - Re: (Score:3)
      
      by bickerdyke ( 670000 ) writes:
      
      This wasn't meant as a comparision of better and worse. Just as a set of specific risk for digital archives.
      Go and try to read your letters from a 5.25'' floppy disc with your VizaWrite-files from just a few years ago. Wouldn't have happend with paper printouts.
      On the other hand, go to a movie archive and see the first cellulose movies lost due to simply rotting away... wouldn't have happened with DVDs
      Then again, if there's no DVD player left....
      A form of archiving, that needs special knowledge (file format
      - Re: (Score:2)
        
        by HiThere ( 15173 ) writes:
        
        Yes, and you CAN print it out. And you CAN print it on good paper...
        but what about the inks that you are using? I don't think those will survive very long. And getting better inks that will work with an existing printer is a real problem.
        FWIW, I don't really have a much better answer than an improved clay tablet. And preserving anything that way is so expensive that it won't be done...except on a trivial scale. The original CDs were durable things, but that doesn't apply to the ones that you can burn at
        
        Re: (Score:2)
        
        by mug funky ( 910186 ) writes:
        
        optical discs are actually made in a near identical process to microfiche.
        we could simply etch much much smaller using lasers on current replication hardware. you could probably write a small program that translates text files into an ISO file you could burn yourself that results in a human-readable disc.
        hell, i want to try that. that sounds amazing.
        
        Re: (Score:2)
        
        by HiThere ( 15173 ) writes:
        
        IIUC, current consumer CDs and DVDs write using a phase transition process that changes the reflectivity of the metallic layer written upon. Over time this relaxes back into the low energy configuration. It may be good for a decade or two, but I doubt that it's even good over a century.
        
        Re: (Score:2)
        
        by plover ( 150551 ) writes:
        
        It's not that simple. (Nothing ever is.) Preserving information for the future runs into a lot of issues.
        
        The media can degrade over time (plastics degrade and become brittle, adhesives let go, corrosion of moving parts, seizure of old lubricants)
        The media can be lost (labels fall off, disorganized storage, fire or flood, etc.)
        The readers are less available (punched paper tape readers, 9 track tape, cassette tape readers, 8" floppy drives, 5-1/4" floppy drives, DAT, Zip drives, etc.) Even CD-ROMs are on
      - Re: (Score:2)
        
        by mug funky ( 910186 ) writes:
        
        films that old don't necessarily rot. they either get eaten by fungus or burn on their own once exposed to ambient air. Nitrates were not an ideal material for making precious archival materials from...
    - Re: (Score:1)
      
      by operagost ( 62405 ) writes:
      
      This is the survivor bias that leads to conclusions like "they don't make them like they used to," not realizing that the fragile or poorly-constructed crap has largely been destroyed without a trace.
    - Here ya go: (Score:1)
      
      by Medievalist ( 16032 ) writes:
      
      I mean how many records of the ancient Egyptian space race survive to this day?
      https://www.google.com/search?q=helicopter+of+abydos&tbm=isch&source=univ&sa=X [google.com]
      Ancient Egyptian spacecraft & helicopters!
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Greece fell,
  Oh, come on, Greece is still working on securing more loans, it hasn't fallen yet!
Having looked at the design... (Score:3)

by pongo000 ( 97357 ) writes: on Thursday November 15, 2012 @10:59AM (#41991465)

...I think it's fundamentally flawed in that it would not take much to have a misaligned page sliced right out of the book. Certainly nothing I'd risk a book of any value over. Sorry, this one appears to be a non-starter (although it is rather novel, pun intended).

- Re: (Score:2)
  
  by sexybomber ( 740588 ) writes:
  
  In that case, you could use one of the manual ones at diybookscanner.com [diybookscanner.com] and turn the pages yourself, trading speed for safety.
  - - Re: (Score:2)
      
      by leighklotz ( 192300 ) writes:
      
      In point of fact, for individual scanning, the video even mentions that this linear scanner is SLOWER than a manual scanner such as the diybookscanner. The gains come in that since its automatic, a single person could keep 8 or 10 of them running at at time.
      Yup. Progress in clock speeds has pretty much slowed down, and Google appears to expect future performance enhancements to come in the form of parallelism
      - Re: (Score:2)
        
        by jab ( 9153 ) writes:
        
        Clock speed can be quadrupled by switching to a pipeline architecture. See 24:28 [youtube.com] of the video.
- Re: (Score:2)
  
  by Dishevel ( 1105119 ) writes:
  
  Because the paper itself is more important than the content?
  We need more people in this world who understand value.
- Re: (Score:2)
  
  by Patch86 ( 1465427 ) writes:
  
  If I had a truly unique and special book that must not be damaged, and I wanted to digitize it, I'd bite the bullet and do it very carefully by hand (which you could do, over a long enough time scale, with just about any household USB scanner).
  If, however, I wanted to digitize the contents of my personal book collection, which is several hundred books none of which couldn't be replaced via Amazon or eBay, this would be good for the job. So it shreds my 20 year old copy of Asimov's Foundation- I'd be a bit c
  - Re: (Score:2)
    
    by plover ( 150551 ) writes:
    
    Just as you wouldn't trust your valuable books to this page-turning scanner, you wouldn't scan those same books with the typical household USB scanner, either. Those scanners generally require the books to be opened 180 degrees and pressed flat in order to get the scanning element close enough to the margins, and that can damage the pages and/or the binding.
    The prototypical DIY scanner uses a book rest and platen set at a 90 degree angle, which is safe for most books, and as you're manually turning the pag
The missing link (Score:2)

by Trevelyan ( 535381 ) writes:

I am guessing that this is the Google TechTalk video that is discussed in the summary, but not linked (or more likely edited out): http://www.youtube.com/watch?v=4JuoOaL11bw [youtube.com]
- Re: (Score:2)
  
  by leighklotz ( 192300 ) writes:
  
  Yes, it was in my submission but apparently edited for brevity. TL;DW?
- Re: (Score:2)
  
  by leighklotz ( 192300 ) writes:
  
  I remember thinking the same thing then.
  - Re: (Score:2)
    
    by HiThere ( 15173 ) writes:
    
    And you were right then, just as he is right now.
Google's motivation (Score:5, Insightful)

by swillden ( 191260 ) writes: <shawn-ds@willden.org> on Thursday November 15, 2012 @11:38AM (#41991819) Journal

The summary questions Google's motivations for doing this, but I think it should be clear this isn't a Google project, really. 20% projects can't be totally random, personal things that have no relationship whatsoever with the business or possible business... but the link can be very tenuous, and the cooler the project is, the weaker it can be. All tech managers at Google are engineers themselves and tend to be just as able to geek out about cool stuff as the people they supervise.
Various other bits of obvious Google support for the project are also more incidental than planned. For example, Dany mentions that he built the machine in one of the on-campus workshops. Those workshops are there for "real" work, but they're also available for any employees to use on an as-available basis. Tech talks are also organized by and for the employees for their own interests, with basically zero "corporate" supervision. Most are actually job-related, but far from all. There are plenty of project talks and hobby talks (though this particular hobby/project talk is much cooler than most).
I imagine there was a cursory review required to get permission to publish the talk and the design, but such things tend to be handled on a "is there some really good reason we should say no?" basis. If not... go for it. Publishing cool, geeky things done by Google engineers is pretty positive for Google's brand, and it makes the engineers happy, which is good for employee retention -- especially since the kind of employees who do cool stuff for fun is the kind Google most wants to retain.
Bottom line: It's very unlikely anyone at Google has a corporate strategy built around the release of this information. It's just an engineer doing something he thinks is fun and valuable (to someone) and the company providing generic support for such activities, and otherwise staying out of the way.

- Re: (Score:3, Informative)
  
  by ceoyoyo ( 59147 ) writes:
  
  "Could it be that the Google Books team has had enough of destroying the library in order to save it?"
  The Google Books team is not Google. It's a a group of people, some of whom built this non-destructive reader. It's quite likely these people, who probably love books, started by wondering if there was a way they could scan their content without damaging them physically, and decided to use their 20% time to figure it out.
  As for scanning books, that is most definitely a Google-the-company supported project
- Re: (Score:2)
  
  by cruff ( 171569 ) writes:
  
  If you had bothered to watch the video, you would have seen that there are two image sensors that capture two pages a pass.
- Re: (Score:2)
  
  by leighklotz ( 192300 ) writes:
  
  See archive.org...
  Yes, that's in the original submission, as you see above. For the record, Brewster Kahle (who founded Archive.org), Jeff and Danny (who did this project), and I are all MIT alums, and the "Internet Archive scanning robot" is from a company called Kirtas, which also has ties to Xerox.
Shredder scanner (Score:3)

by tnk1 ( 899206 ) writes: on Thursday November 15, 2012 @01:08PM (#41992933)

I'm waiting for a reference to the shredder-scanner to come up from Rainbow's End.
http://en.wikipedia.org/wiki/Rainbows_End [wikipedia.org] (although the wiki article doesn't mention that piece of the plot, sadly)

What about the patents? (Score:2)

by Pinky's Brain ( 1158667 ) writes:

Google has a patent on using structured lighting to determine the shape of the page and correct the image ... is that open too?
- Re: (Score:2)
  
  by c0lo ( 1497653 ) writes:
  
  Google has a patent on using structured lighting to determine the shape of the page and correct the image ... is that open too?
  The license section on the googlecode [google.com] page (scroll to the bottom):
  Additional IP Rights Grant (Patents)
  Google hereby grants to you a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, transfer, and otherwise run, modify and propagate this design where such license applies only to those patent claims, both currently owned by Google and acquired in the future, licensable by Google that are necessarily infringed by This design.
  Does this answer your question?
  - Re: (Score:2)
    
    by Taxman415a ( 863020 ) writes:
    
    unfortunately this scanner doesn't incorporate anything that would use google's 3d structured lighting (laser grid, etc) scanning patent, so the patent grant for this scanner does not open up that patent. Google's laser grid patent allows automatic dewarping of a curved page, but this is a moving flatbed scanner. Nothing I've found so far incorporates any of the laser grid stuff.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

False economy (Score:5, Insightful)

Re:False economy (Score:5, Funny)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Very Good Wiki Direction (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Harvesting knowledge in case of society collapse (Score:4, Insightful)

Re:Harvesting knowledge in case of society collaps (Score:5, Insightful)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Here ya go: (Score:1)

Re: (Score:1)

Having looked at the design... (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

The missing link (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Google's motivation (Score:5, Insightful)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Shredder scanner (Score:3)

What about the patents? (Score:2)

Re: (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals