Yahoo Competes with Google in Book Scanning 193

Posted by ScuttleMonkey on Monday October 03, 2005 @05:04PM from the my-literary-collection-is-bigger-than-yours dept.

UltimaGuy writes "A consortium backed by Yahoo has launched an ambitious effort to digitize classic books and technical papers and make them freely available on the Web. The company is partnering with the newly formed Open Content Alliance, which aims to offer PDF documents of books to the public at no charge. Consumers will be able to search the contents of the Open Content Alliance's database and download the entire content of any work, such as a scanned copy of a book."

This discussion has been archived. No new comments can be posted.

Yahoo Competes with Google in Book Scanning

Load All Comments

Search 193 Comments Log In/Create an Account

Comments Filter:

RIAA Problems Solved (Score:5, Funny)

by GreggyBUIUC ( 262370 ) writes: on Monday October 03, 2005 @05:07PM (#13707510)

Someone start up a "Open Content Alliance" for music... then we can digitize and share it all we want.

Share
twitter facebook
- Re:RIAA Problems Solved (Score:2)
  
  by rf0 ( 159958 ) writes:
  
  Just do a load of covers which sound very very like them and you can legally play them
  
  Rus
  - Re:RIAA Problems Solved (Score:2)
    
    by no reason to be here ( 218628 ) writes:
    
    Ummm...not without ASCAP being paid off first. You think the RIAA is bad, just wait 'til you run into the thuggish tactics of ASCAP.
- Yahoo seaches for Creative Commons (Score:2)
  
  by mrklin ( 608689 ) writes:
  
  Yahoo! Advanced Search at http://search.yahoo.com/web/advanced?ei=UTF-8 [yahoo.com] allows one to search for Creative Commons licensed content.
- Re:RIAA Problems Solved (Score:2)
  
  by sik0fewl ( 561285 ) writes:
  
  In PDF format, no less!
- Re:RIAA Problems Solved (Score:2)
  
  by blibbler ( 15793 ) writes:
  
  Isn't that what the original MP3.com was? Independent artists essentially letting everyone download their music for free.
- - - - More expensive books? (Score:3, Interesting)
        
        by Grendel Drago ( 41496 ) writes:
        
        Huh? Where are you from? I worked at a research library at a large state university, and I have no idea what you're talking about. True, libraries pay extortionate rates for journal subscriptions, but when they purchase monographs, they frequently get them off the used book market, just like you or I would. It costs them extra to get it bound in a durable fashion, and to enter it into their Byzantine catalog system, but I've never, ever heard of libraries having to pay extra for books simply because they we
Will Yahoo scan it like they have yahoo.com? (Score:5, Funny)

by Anonymous Coward writes: on Monday October 03, 2005 @05:07PM (#13707512)

I can't wait to read the whole book on one page.

Share
twitter facebook
- Re:Will Yahoo scan it like they have yahoo.com? (Score:2)
  
  by m50d ( 797211 ) writes:
  
  You jest, but I'd find that much better than pdfs. There's a perfectly good format for reading things on a screen how you want to, it's called html. I want to have longer lines on my huge monitor, be able to apply my own stylesheets to the document, etc. PDF is for printing.
no mention of project gutenberg (Score:4, Insightful)

by justforaday ( 560408 ) writes: on Monday October 03, 2005 @05:08PM (#13707526)

I find it interesting that in all the articles I've looked at today about this that only one has mentioned Project Gutenberg. Naturally, I can't recall which source it was...

Share
twitter facebook
- - Right you are! See TEI. (Score:3, Interesting)
    
    by Grendel Drago ( 41496 ) writes:
    
    Indeed. It's bothered me for some time now that it takes a good deal of doing to make a nice LaTeX edition of the book, so that it's nontrivial to go from the eBook to a really high-quality printed page.
    
    Luckily, someone's decided to do something about it. See PGTEI [gutenberg.org], a very verbose and flexible method for marking up literary works. The full TEI spec [wikipedia.org] is gargantuan, so PGTEI is actually a dialect of a subset called TEI Lite. It's an XML markup scheme which has output filters (it uses XSLT, it seems) for plain
What a concept. (Score:5, Informative)

by Anonymous Coward writes: on Monday October 03, 2005 @05:09PM (#13707539)

I liked the idea the first time I heard it - back when it was called Project Gutenburg. :P

Share
twitter facebook
- - Re:Different than Gutenburg (Score:2)
    
    by _Sprocket_ ( 42527 ) writes:
    
    The difference is that they also have an "opt in" program, wherein any publisher can have their works indexed upon request, without being redistributed in full.
    
    Back in the early 90's, that was called the World Wide Web (and search engines). Which puts Yahoo... well... where they began.
  - Gutenberg is more than book-scanning. (Score:2)
    
    by Grendel Drago ( 41496 ) writes:
    
    Project Gutenberg does a lot more than scan books. (Actually, they frequently don't actually scan the books themselves; projects like the Million Book Project do that.) The value that PG provides is in the proofreading and formatting of their eBooks. That said, any massive scanning project which provides page images for PG to pick up is quite a good thing.
What do these guys know... (Score:5, Interesting)

by dada21 ( 163177 ) * writes: <adam.dada@gmail.com> on Monday October 03, 2005 @05:10PM (#13707542) Homepage Journal

...that we don't?

It seems to me that they're throwing money at an unnecessary application. Does Yahoo know something that we don't? I'd venture that they're starting with PD books to shake the bugs out of their platform so the app works well in round 2.

Round 2 (current commercial books) won't occur without a massive copyright law change or support of the Author's Guild.

Hmm.

Share
twitter facebook
- Re:What do these guys know... (Score:2)
  
  by Brigadier ( 12956 ) writes:
  
  well they know it's all about content. Being advertisment driven sites they have to offer content and experiences that will attract people to there portal. ie search engine, e-mail, clubs, blogs etc.
  - Re:What do these guys know... (Score:2)
    
    by dada21 ( 163177 ) * writes:
    
    Yet ancient content isn't a driving element for even tiny groups, is it?
Project Gutenberg (Score:5, Informative)

by timeToy ( 643583 ) writes: on Monday October 03, 2005 @05:11PM (#13707546)

16k ebooks to choose from today, more to come, no Google, no Yahoo.
http://www.gutenberg.org/ [gutenberg.org]

Share
twitter facebook
- Re:Project Gutenberg (Score:5, Interesting)
  
  by harmonica ( 29841 ) writes: on Monday October 03, 2005 @05:22PM (#13707630)
  
  More books are a good thing. Having a scanned PDF version includes graphics as well, which are missing from Gutenberg ebooks. So I see this as a very positive development.
  
  Parent Share
  twitter facebook
  - Re:Project Gutenberg (Score:5, Informative)
    
    by timeToy ( 643583 ) writes: on Monday October 03, 2005 @05:49PM (#13707802)
    
    It depends, some book do carry graphics, for instance the Slashdot friendly "Amusements in Mathematics" by Henry Ernest Dudeney, 1917
    http://www.gutenberg.org/etext/16713 [gutenberg.org] the Html zipped version do carry all the original drawings.
    
    Parent Share
    twitter facebook
    - Awesome, indeed! (Score:3, Interesting)
      
      by Grendel Drago ( 41496 ) writes:
      
      I remember seeing some of Dudeney's puzzles referred to before, but I couldn't remember where. Then the book popped up on my RSS feed (it was released within the last month, I think), and indeed, it was full of fun math puzzles. Man, that was nice.
      
      But they don't just have HTML; see various [gutenberg.org] examples of files released with filetype "TEI", including PDF (through LaTeX), TXT (in a variety of encodings, i.e. Latin-1, US-ASCII and UTF-8) and HTML.
    - Re:Project Gutenberg (Score:2)
      
      by sootman ( 158191 ) writes:
      
      They're a great group, but they've using some *really* shitty compression algos. :-)
      
      Format - Encoding - Compression - Size
      HTML - iso-8859-1 - none - - - 1.27 MB
      HTML - iso-8859-1 - zip - - - 5.95 MB
  - Re:Project Gutenberg (Score:2, Funny)
    
    by Infinityis ( 807294 ) writes:
    
    Well this is a problem waiting to get solved. Why don't they incorporate image-to-ASCIIart software so we can get high-quality images from these books?
  - best format? (Score:3, Interesting)
    
    by j1m+5n0w ( 749199 ) writes:
    
    Actually, I prefer plain txt to pdf if I'm reading from a computer (assuming the book is not illustrated), since I have more control over fonts and colors (and I have read quite a few gutenberg books that way). However, I think the best native format (despite its general user-unfriendliness) would be latex, from which txt, pdf, and html could be generated. On the other hand, I suppose it's much easier to generate txt or pdf from scanned pages than latex.
  - - Different scope. (Score:2)
      
      by Grendel Drago ( 41496 ) writes:
      
      Project Gutenberg does proofreading and postproduction, which requires a lot more human eyeballs than scanning a lot of pages. While this archive may be tremendously useful to PG by providing raw material, it's not a duplication of effort.
- Re:Project Gutenberg (Score:2, Troll)
  
  by Reality Master 101 ( 179095 ) writes:
  
  No images, graphics, no typography, no typesetting...
  Project Gutenberg is great and all, but there's something to be said for some effort made at presentation. Sometimes italics are a good thing.
  - Re:Project Gutenberg (Score:4, Interesting)
    
    by shellbeach ( 610559 ) writes: on Monday October 03, 2005 @06:41PM (#13708129)
    
    Project Gutenberg is great and all, but there's something to be said for some effort made at presentation. Sometimes italics are a good thing.
    
    It's not a great solution, but emphasis _is_ preserved in the etexts, just like that. Or occasionally like THIS ... Pity there's no consistency, but for most texts it works well enough.
    
    Also, the fact that they are plain text, with no markup, formatting, binary code, whatever in them means that they'll always be accessible to anyone, regardless of software or platform. And that's a good thing, too!
    
    Parent Share
    twitter facebook
    - File format issues (Score:2)
      
      by harmonica ( 29841 ) writes:
      
      Also, the fact that they are plain text, with no markup, formatting, binary code, whatever in them means that they'll always be accessible to anyone, regardless of software or platform. And that's a good thing, too!
      
      I know about the problems that old file formats can cause. However, I doubt that formats like PDF or JPG will ever get "lost". There's just too much information stored in them, and various free libraries available with source code which read and write them.
      
      And if I'm wrong I won't live to see it.
      - Re:File format issues (Score:2)
        
        by shellbeach ( 610559 ) writes:
        
        I know about the problems that old file formats can cause. However, I doubt that formats like PDF or JPG will ever get "lost".
        
        My point was that since the emphasis is included in the file, you could always convert it to a nicely formatted PDF if you wanted to. In fact, I used to do almost exactly that a while back - I wrote some perl script to convert etexts to RTF and peanut markup language, and it worked pretty nicely. Keeping things at the lowest common denominator level isn't always a bad thing...
        
        Perso
    - Re:Project Gutenberg (Score:2)
      
      by m50d ( 797211 ) writes:
      
      Also, the fact that they are plain text, with no markup, formatting, binary code, whatever in them means that they'll always be accessible to anyone, regardless of software or platform. And that's a good thing, too!
      HTML would accomplish the same thing. It's a public standard, implementable by anyone on any platform, and convertable to plain text by a simple regex substitution. You're no more likely to find someone who can't read an html file than someone who can't read an ascii text file.
      - Re:Project Gutenberg (Score:2)
        
        by shellbeach ( 610559 ) writes:
        
        HTML would accomplish the same thing. It's a public standard, implementable by anyone on any platform, and convertable to plain text by a simple regex substitution. You're no more likely to find someone who can't read an html file than someone who can't read an ascii text file.
        
        I agree, personally. However, you could also argue that _this_ sort of emphasis is convertible to html with a simple regex substitution - my point was simply that the texts haven't lost any information. Ultimately, it doesn't really
- Re:Project Gutenberg (Michael Hart essay) (Score:3, Informative)
  
  by gbnewby ( 74175 ) * writes:
  
  Here's something Michael Hart wrote about this today. He's
  the founder of Project Gutenberg, and inventor of eBooks.
  -- Greg
  
  Yet another consortium of multi-billion dollar institutions
  has thrown its hat into the eBook/eLibrary ring today, just
  9 months before the 35th Anniversary of Project Gutenberg's
  placement on the Internet of the first eLibrary element, on
  July 4th, 1971.
  
  Last December 14th Google used a multi-million dollar blitz
  of television, radio and print media to announce the Google
  Print
Whew! (Score:5, Interesting)

by op12 ( 830015 ) writes: on Monday October 03, 2005 @05:12PM (#13707555) Homepage

I almost panicked after seeing we had gone so long without a Google-related article.

The opt-in rather than opt-out strategy is really what Google probably should have done, but it'll be interesting to see who comes out as a winner, Yahoo or Google, in all of this.

Share
twitter facebook
But will they digitize PD works from after 1922? (Score:5, Informative)

by Anonymous Coward writes: on Monday October 03, 2005 @05:12PM (#13707556)

In the US, books published after 1922 can still be public domain if the author was American, it was originally published in the US, and the copyright was not extended at the end of the original copyright period. Google Library does not seem to be making an exception for this, will OCA? Project Gutenberg does.

Share
twitter facebook
- Re:But will they digitize PD works from after 1922 (Score:2)
  
  by Shamashmuddamiq ( 588220 ) writes:
  
  I don't understand this. My favorite book was published in 1956, and the author died just 7 years later. He had no offspring and he outlived his wife. Now would someone please explain to me why someone was allowed to extend the copyright and why the work isn't yet in the public domain?
  - Re:But will they digitize PD works from after 1922 (Score:2)
    
    by blibbler ( 15793 ) writes:
    
    I am not specifically familiar with US copyright, but copyright in most jurisdictions extends for 50 or 70 years after the death of the author. In your example, the copyright naturally should extend to 2013, or 2033.
    There are some exceptions to this. Perhaps most well known is Peter Pan which the UK has granted a perpetual copytright in favour of the Great Ormond Street Hospital.
    - Re:But will they digitize PD works from after 1922 (Score:3, Informative)
      
      by thisissilly ( 676875 ) writes:
      
      In the US, that is only true of works published after 1978.
      When U.S. works pass into the Public Domain [unc.edu] is a good summary of the U.S. issues.
      Me, I just want 14+14 back.
  - You're in luck! (Score:2)
    
    by Grendel Drago ( 41496 ) writes:
    
    Assuming the work was written only by American citizens:
    
    Actually, if no one renewed the copyright (renewal became automatic for works published in 1964 or later), it may be public domain. Read the new and improved Rule 6 HOWTO [pglaf.org] that the fine folks at Project Gutenberg have put together. You can put together a reasonable case that copyright was not renewed, and heck, maybe you could get PG to pick up the book.
    
    Or you could move to Canada and wait until January 1, 2013, when the author's work will enter the pub
Not really an up-stage (Score:4, Informative)

by ChocoBean ( 890202 ) writes: on Monday October 03, 2005 @05:14PM (#13707569)

Actually this won't "Upstage" google in any way.

FTA:
all the content will be made available so it can be indexed by all the other major search engines, including Google's

Yahoo is just going to scan, scan and scan. We all already prefer google's indexing and searching and cleaner interfaces, so the only thing Yahoo! will accomplish by this is help google print along, sheilding all (other) copyright law suits. Once the stuff is online, we all know that Google-bots will be all over it "like a fly on a pile of very seductive manure (Zapp)"

Excellent.

I just hope publishers realise that in this case neither google or yahoo is trying to be their best friend.

Share
twitter facebook
- Re:Not really an up-stage (Score:2)
  
  by krunk4ever ( 856261 ) writes:
  
  I don't really think what Google and Yahoo are doing is exactly the same. Yahoo seems to be only digitizing specific books and text (probably the ones that Open Content Alliance has licenses to). In fact, it clearly says so in the article:
  
  Internet powerhouse Yahoo Inc. is setting out to build a vast online library of copyrighted books that pleases publishers -- something that rival Google Inc. hasn't been able to achieve.
  
  The Open Content Alliance, a project that Yahoo is backing with several other partners
- - Re:Not really an up-stage (Score:2)
    
    by op12 ( 830015 ) writes:
    
    If you use Google search to get to Yahoo content, who do you think is getting the bulk of the ad dollars? Hint: Yahoo is fine with this arrangement.
    
    Not necessarily...you are going to see a Google ad related to your search before you see a Yahoo one related to your search. If you didn't care about ads the first time (at Google), why would you when Yahoo hits you with them again? I think that probably Google benefits from someone finding something through them, and Yahoo's benefit is much reduced.
- - Re:Not really an up-stage (Score:2)
    
    by krunk4ever ( 856261 ) writes:
    
    I'm guessing the parent was trying to be funny since a9 uses google results, just with a more presonalized interface. i've been using a9 because amazon gives me the pi/2% off all amazon products.
- - Re:Not really an up-stage (Score:2)
    
    by ediron2 ( 246908 ) * writes:
    
    And I just hope that writers realize that the publishers might realize that in this case neither google nor yahoo is trying to be their (the publishers') best friend.
What about China? (Score:4, Interesting)

by DAldredge ( 2353 ) writes: <SlashdotEmail@GMail.Com> on Monday October 03, 2005 @05:14PM (#13707573) Journal

Will Yahoo provide sorted or unsorted lists of books that China's Internet uses view to the thugs that run China?

Share
twitter facebook
- Re:What about China? (Score:2)
  
  by m50d ( 797211 ) writes:
  
  It's google that was doing that the last I saw. But slashdotters are strangely quieter about that.
The difference between Google and Yahoo's effort (Score:5, Insightful)

by doctor_no ( 214917 ) writes: on Monday October 03, 2005 @05:17PM (#13707595)

Seems like the crucial difference between Google's efforts and the OCA(Open Content Alliance) is that Google has a "opt-out" policy for copyrighted material, while OCA specifically requires the copyright holder to contact them and essentially allow them to use the material.

The OCA likely won't be sued by the Writer's Guild like Google, however, for searching material Google will likely be better being that Google's search will likely include a massive plethora of copyrighted material, legal or not. Also, it seems that Google themselves will be allowed to use all the material from the OCA into their project as well.

Share
twitter facebook
Companies should Get Original (Score:2, Insightful)

by TarrySingh ( 916400 ) writes:

Why can't companies come up with some cooler ideas? Why ape each other? First Google and hten Yahoo, Sure MS will also want to play.
NOT competing (Score:5, Informative)

by daniil ( 775990 ) writes: <evilbj8rn@hotmail.com> on Monday October 03, 2005 @05:23PM (#13707633) Journal

There's a slight difference between an 'Internet-based library' and 'searching inside books'.

Share
twitter facebook
Apples and Oranges! This is not Google Print! (Score:5, Informative)

by merreborn ( 853723 ) writes: on Monday October 03, 2005 @05:26PM (#13707655) Journal

Google Print's goal is to allow people to search book content, WITHOUT giving them the content of the book.

For example, searching "Zoroastrianism" would return a list of book titles on the subject, and links to purchase the books in question. You CANNOT download the content of the book!

The OCA (The group Yahoo just joined) is an opt-in, full content hosting project.

Searching "Zoroastrianism" would return a (much smaller) list of books, with the *full* content of the book available for download with the explicit consent of the publisher/author!

Share
twitter facebook
- Re:Apples and Oranges! This is not Google Print! (Score:2)
  
  by DJCF ( 805487 ) writes:
  
  Ahh, this is where it gets confusing. Don't worry, alot of +5 insightful comments on Google in the past few months have made this mistkae.
  What will library books in Google look like?
  If you are in the United States and you search for Books and Culture by Hamilton Wright Mabie, for instance, you'll be able to page through as much of it as you like, because its 1896 copyright means it's now in the public domain in the United States. These public domain books look very similar to publisher-submitted books exce
Sad thing about Yahoo though (Score:3, Interesting)

by totallygeek ( 263191 ) writes: <sellis@totallygeek.com> on Monday October 03, 2005 @05:26PM (#13707656) Homepage

You will be reading the content to Moby Dick on Yahoo [yahoo.com] and in the top right it will say, "content provided by Google [google.com]."

Share
twitter facebook
- Re:Sad thing about Yahoo though (Score:2)
  
  by WindBourne ( 631190 ) writes:
  
  As opposed to "The Minnow" , content provided by Microsoft?
Annoying (Score:2)

by rm999 ( 775449 ) writes:

I am getting tired of the big internet companies straight up copying each other. Yes, it means that products slowly get improved over time (eg. yahoo mail -> gmail -> yahoo mail) but it also means that the companies aren't innovating enough. Yahoo is spending time and money on providing a product that is already offered. We would probably be better off if they spent the effort on providing a unique service - like scanned magazines or something.
- Re:Annoying (Score:3, Insightful)
  
  by ScentCone ( 795499 ) writes:
  
  I am getting tired of the big internet companies straight up copying each other.
  
  Should we turn to you to tell us which provider of each major online activity is the one we should all use? Even if the differences are incremental and subtle, I'm glad when I get to choose between Yahoo's and Google's take on a particular app/service. I'm also glad that Audi and Toyota and GM and Honda all have different ideas on cars... even though someone else built one once already. Come on - not every service offered is
  - Re:Annoying (Score:2)
    
    by rm999 ( 775449 ) writes:
    
    I never said I hate the lack of choice. In fact I like it (duh). I just said it annoys me that there isn't more large-scale innovation - very few new features come out. Two large, multi-billion dollar companies should be able to do a little more.
    
    As an example of my point, two image search engines require double the effort of one, but only provide incremental benefit to the user. Instead of copying altavista's image search (which I still think is better), google could have implemented something entirely new.
    - Re:Annoying (Score:4, Informative)
      
      by Moofie ( 22272 ) writes: <(lee) (at) (ringofsaturn.com)> on Monday October 03, 2005 @07:39PM (#13708441) Homepage
      
      "very few new features come out"
      
      Have you seen Google Earth?
      
      How about the disaster wiki that went together in about 20 minutes, where people were posting status reports of New Orleans properties?
      
      I think you're damning with faint praise. Google, at least, consistently builds superb offerings, and the price is right. Not quite sure what you're grousing about...
      
      Parent Share
      twitter facebook
Yikes, How long ... (Score:2)

by pin_gween ( 870994 ) writes:

will it take to download that PDF of War and Peace?
- Re:Yikes, How long ... (Score:2)
  
  by Kevinv ( 21462 ) writes:
  
  couple of minutes (on dsl) from project gutenberg. it's text instead of pdf though.
  
  http://www.gutenberg.org/etext/2600 [gutenberg.org]
University of Calif: Yahoo OK, Guttenburg banned (Score:5, Interesting)

by dananderson ( 1880 ) writes: on Monday October 03, 2005 @05:41PM (#13707754) Homepage

I find it funny (in an ironic way only) that the University of California is allowing its public domain books to be scanned by Yahoo. At the same time, UC libraries prohibit scanning for Project Gutenberg [gutenberg.org] or other true "open" content projects unless they receive $$$$ in royalities.
I hate to see a University pander to commercial interests, while at the same time, welcome commercial interests such as Yahoo. Money talks, and I'm sure UC is being paid a lot, but libraries are supposed to be public resources too, not exclusive profit-centers :-(.

Share
twitter facebook
- Re:University of Calif: Yahoo OK, Guttenburg banne (Score:2)
  
  by jp10558 ( 748604 ) writes:
  
  Yeah, but can't anyone just take the online library text and put it in Gutenberg? I mean, it's public domain content, no one can sue for anything there.
  - Erosion of Public Domain--not just Disney and RIAA (Score:3, Informative)
    
    by dananderson ( 1880 ) writes:
    
    The physical owner of a PD book (library) can prohibit scanning or even viewing. For modern books, it's not a problem--just go to another library. For some books it is a problem. Few copies exist, and they are scattered around the world.
    The library can require a legal agreement to view or scan the book, and that is where a lawsuit can occur. Of course, the legal agreement doesn't apply to 3rd parties that haven't signed. It's another example of the erosion of the public domain--it's not just Disney an
    - Re:Erosion of Public Domain--not just Disney and R (Score:2)
      
      by jp10558 ( 748604 ) writes:
      
      Sorry, I wasn't clear. Say they let Yahoo scan the books,because Yahoo decides to pay for the OCA. Can't anyone just copy or OCR the Yahoo PDF or whatever to gutenberg as the text is public domain?
- Dumb question... (Score:2)
  
  by tkrotchko ( 124118 ) * writes:
  
  How can you prohibit scanning of PD books?
  - Physical owner of PD book controls its use (Score:2)
    
    by dananderson ( 1880 ) writes:
    
    This is possible if the book is rare and the owner has physical custody. For libraries, this is usually through a controlled-access "special collection" area. They can and do prohibit scanning or transcribing of books, even if PD. They can require signing a legal agreement (license) with any terms they like, such as requiring royalities or restricting further distribution.
- Re:University of Calif: Yahoo OK, Guttenburg banne (Score:3, Interesting)
  
  by esme ( 17526 ) writes:
  
  At the same time, UC libraries prohibit scanning for Project Gutenberg or other true "open" content projects unless they receive $$$$ in royalities.
  
  do you have a source for this? do you mean that a UC library tried to stop someone from checking out books and scanning them? or do you mean that they didn't allow the gutenberg folks to setup a scanning shop inside a library? there's a huge difference between those two.
  i work at a UC library, and i've certainly never heard of any policies about project
  - University of California locks away public domain (Score:3, Interesting)
    
    by dananderson ( 1880 ) writes:
    
    The source is my personal experience with the UCSD, UCI, and UCLA libraries. I assume the other UCs have the same or similar policy against digitizing books. Gutenburg is not a corporation, it's private individuals (volunteers). It's usually one guy (or gal) with a scanner, OCR software, and a little bit of time to proofread.
    would not surprise me to learn that a campus counsel or some such wouldn't let a library give away rights to content that UC held the rights to (like a library's special collections
    - Re:University of California locks away public doma (Score:2)
      
      by esme ( 17526 ) writes:
      
      the UCSD policy you cite says:
      
      Permission to quote is normally freely given, as is the permission to reproduce text or images for such noncommercial use as illustrating a thesis or a dissertation. The Mandeville Special Collections Library assesses a fee for the publication of reproductions for commercial purposes.
      
      which sounds to me like a non-commercial project like gutenberg would probably not have to pay the access fees. the other UCSD policy mostly talks about limiting duplication because it stres
- Re:University of Calif: Yahoo OK, Guttenburg banne (Score:2)
  
  by sootman ( 158191 ) writes:
  
  I find it funny (in an ironic way only) that the University of California is allowing its public domain books to be scanned by Yahoo. At the same time, UC libraries prohibit scanning for Project Gutenberg or other true "open" content projects unless they receive $$$$ in royalities... libraries are supposed to be public resources...
  
  University library != public library.
Reading Between the Lines (Score:2, Redundant)

by 99BottlesOfBeerInMyF ( 813746 ) writes:

Reading between the lines for this proposal we seem to have another print.google.com, except it will not index a huge number of works whose copyright holders do not "opt in" to the program. The advantage to this is that it may make some copyright holders feel better about the whole thing and, hopefully submit entire works to be viewed by the public. It is also possible that Yahoo is worried about the legal issues and want to wait and see how google weathers any legal challenges.

From a purely technical pe
PDF?! yuck (Score:2, Insightful)

by BillHop ( 82717 ) writes:

Does anyone else find there is no way to read a PDF with the scroll buttons (mouse wheel, etc.) without the viewer constantly breaking your flow by jumping to the next page?

This goes along with the concept that for an electronic format, I do NOT need a sentence (or even worse, hyphenated word) broken up by two inches of top and bottom margin filled with page numbers, miscellaneous watermarks, repetitive titles, etc.

PS. This being flamebait does not make it false.
- Re:PDF?! yuck (Score:5, Informative)
  
  by Fiver- ( 169605 ) writes: on Monday October 03, 2005 @06:11PM (#13707938)
  
  "Does anyone else find there is no way to read a PDF with the scroll buttons..."
  
  No. I just set it to Continuous. See those four icons in the lower right corner? (assuming you've got a recent version) Play with those. You want the second button from the left
  
  "This goes along with the concept that for an electronic format, I do NOT need a sentence (or even worse, hyphenated word) broken up by two inches of top and bottom margin filled with page numbers, miscellaneous watermarks, repetitive titles, etc."
  
  Well, the whole purpose of PDF is to "preserve the look and integrity of your original documents ... regardless of the application and platform used to create it." Blame the creators of that particular pdf file if you don't like the headers, footers and margin size. When I make pdf books to read on the train...I just finished Dream Quest of Unknown Kadath by Lovecraft...I open the original ascii text file in Word, make the top & bottom margins tiny, change the font to something tolerable and export it.
  
  Parent Share
  twitter facebook
Bookripper on its way? (Score:5, Interesting)

by serutan ( 259622 ) writes: <snoopdoug&geekazon,com> on Monday October 03, 2005 @05:56PM (#13707847) Homepage

Google maintains its scanning represents "fair use" allowed under the law because it only allows Web surfers to view excerpts from copyrighted books.

Soon after Google Mail was introduced, somebody created a SourceForge project that lets you use Google Mail as a database. How long until somebody releases a "Bookripper" app that assembles a whole book from search extracts? As I understand it Google displays two pages at a time (or wait, that's Amazon, but I bet they're similar). All you would need to know is a quote from a book's first page as a seed, and you should be able to grab the whole book by doing a series of searches using text from the second page returned by each search. The trick would be to knit the pieces together and eliminate the overlapping text. Seems almost trivial. Another possibility would be to search for random words and look for overlaps between the results, assembling them like a linear jigsaw puzzle until there are no gaps.

Share
twitter facebook
- Re:Bookripper on its way? (Score:3, Informative)
  
  by gasaraki ( 262206 ) writes:
  
  It's already been done. The guy was sent a 'please stop doing this' letter by Google if I recall, which I think he went along with. No formal suit or anything, but they didn't like it. I'll be damned if I can remember the link, I think there was a K5 story or two on it though.
- Re:Bookripper on its way? (Score:2)
  
  by momerath2003 ( 606823 ) * writes:
  
  Gmail limits the total portion a user sees of a book to 20% of it (it ties records of the book viewing to your google ID). No matter how many searches you do, you can't extract more than a fifth of the book.
- Re:Bookripper on its way? (Score:3, Informative)
  
  by Dan East ( 318230 ) writes:
  
  According to Google, there are specific portions of each book that it will never show, making it impossible to harvest an entire book.
  
  I'm already logged in. Why are you telling me the page is unavailable?
  
  As part of our efforts to protect a book's copyright, a set of pages in every in-copyright book will be unavailable to all users.
  
  http://print.google.com/googleprint/help.html#pag e limit [google.com]
  
  Dan East
"Do no Evil" done right (Score:5, Insightful)

by Chunni Babu ( 920014 ) writes: on Monday October 03, 2005 @06:03PM (#13707883) Journal
Now this is a right step towards making book contents searcheable online. I will hate to see one company like Google copying and caching all books in its massive cluster of servers. I know that Google kool-aid that "we are about general good" is running deeply in the veins of slashdot types.

Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"? This kind of stuff is done by pirates. Go to the major cities in China and India and you will see piles of copied book in the streets all sold for 1/10th the original price without giving anything back to the authors. The pirates can say that they are doing a favor to the authors by driving them out of obscurity.

The message the alliance is sending out to the authors is
- we are not for profit
- we will scan your book only if you want us to do so
- your book will be indexed based on your approval and copyright agreement with you and the publishers
Compare this to what Google is telling the authors
- we will scan your book, fill a form and tell us if you don't want us to do so
- we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you
- if we show ads, we will share the profits with you
- we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude
- we will cache your book in our servers and only we will reserve the right to profit from your scanned book
So much for do no evil. Kudos to yahoo for bringing the open content alliance, gutenberg, and other similar projects to limelight - these are some really nice collections that were hidden by the noise created by 'google print'.
Share
twitter facebook
- Re:"Do no Evil" done right (Score:2, Insightful)
  
  by nursegirl ( 914509 ) writes:
  
  Compare this to what Google is telling the authors
  * we will show excerpts of your book, so if a researcher is researching on a topic he can find what you have written about a topic without ever having to buy your book, too bad, heh heh, write a fiction book dude
  
  Except that Google only shows 2-3 sentences of books that are under copyright. I've never found a researcher that can write on a topic by only reading 2 sentences. It's only posters on /. that can claim expertise on a topic without actually
- Re:"Do no Evil" done right (Score:3, Informative)
  
  by Jeff DeMaagd ( 2015 ) writes:
  
  [i]Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?[/i]
  
  It's not. You are mischaracterizing Google's system. The problem with your claim is that Google's system doesn't make the book available to users to download, it is only a search method that points to the relevant books and provides short excerpts like their search engine does. Google won't provide the book or even whole page without the copyright owner's permission. My impressi
  - - Re:"Do no Evil" done right (Score:2)
      
      by Moofie ( 22272 ) writes:
      
      Way to not let facts get in the way of your opinion.
      
      1) Making money is not inherently evil. Note that Google's scheme will also make money for authors. Google's scheme takes nothing from authors at all.
      
      2) The click on a link also only brings 2-3 sentences (not pages, Sparky...) of text.
      
      3) The virtue of libraries is not that they pay for books, it is that they make as much information as possible available to as many people as possible.
      
      4) See 2.
      
      When the copyright holders start remembering that the purp
    - Re:"Do no Evil" done right (Score:2)
      
      by jp10558 ( 748604 ) writes:
      
      So if Google bought one copy of the book (or heck, 100 so they were ever only showing one copy to one person at a time) it would be ok?
      
      It'd basically be a faster interlibrary loan system.
    - Re:"Do no Evil" done right (Score:2)
      
      by Jeff DeMaagd ( 2015 ) writes:
      
      1. The only thing Google is trying is to make money out of other people's work.
      
      So do book stores.
      
      2. The sale of a book brings author money. The click on a link without sale only brings Google money.
      
      But what you first complained was:
      we will take sale comissions from amazon, buy.com, bn.com, etc. without sharing anything with you
      
      Sales commissions is different from link clicks.
      
      4. 2-3 pages are sometimes enough to get an idea. A researcher looks at an index of a book and then reads the pages based on keyword. G
- Re:"Do no Evil" done right (Score:3, Insightful)
  
  by Anonymous Coward writes:
  
  Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?
  
  How disingenuous. Google Print shows only a snippet of the text and tells you how to buy the book if it seems like what you need. Not pages, not paragaphs - a couple of sentences. In fact, Google Print instantly returns pretty much what you'd get if you hired a researcher to go find X number of books with such and such text and the researcher prepared a paper with a short quote from eac
- Re:"Do no Evil" done right (Score:3, Informative)
  
  by _Sprocket_ ( 42527 ) writes:
  
  Since when was scanning books from libraries and making them available to public for a profit was considered "fair use"?
  
  Since when is Google doing this? As others have pointed out, Google provides a portion of the work to give the search context - 3 pages. In another post [slashdot.org], you claim that 3 pages is enough information to invalidate the sale of a book. If this is the case, I would have to seriously question the value of your work. Either that - or take a serious look at public libraries, private loaning
- Re:"Do no Evil" done right (Score:2)
  
  by serutan ( 259622 ) writes:
  
  Go to the major cities in China and India and you will see piles of copied book in the streets all sold for 1/10th the original price without giving anything back to the authors. The pirates can say that they are doing a favor to the authors by driving them out of obscurity.
  
  Interesting. Except for the cut-rate pricing, this is how the recording industry has been operating for a century.
This is huge. IA beat Google and Yahoo to this... (Score:4, Insightful)

by Anonymous Coward writes: on Monday October 03, 2005 @06:04PM (#13707887)

I've read through the first few posts, and people really don't have a clue about what this is all about. "Open Content Alliance"... It means what it says. Open f'ing content. Let there be content available to the masses... Is it more important that I can get a snippet from some copyrighted text, or that millions of children can read Alice in Wonderland with all it's wonderful illustrations.

This is beyond PDF or anything like that. Some people want PDF, so Adobe will make them. Some people want decent OCR versions, perhaps to go into Distrubuted Proof readers or into someone's text-only PDA. It's ALL possible. This is NOT an exclusive club, it's an INCLUSIVE community that is dedicated to Open f'ing Content.

Why don't you people get it. By allowing people to have full texts of some of humanities greatest works we are doing more than a few snippets of the latest Ken Follet novel... a lot more.

It's bigger than Yahoo or Google. Yahoo is NOT an also-ran.... The Internet Archive has been scanning books and hosting Milloins Books project texts as well as Project Gutenberg texts for a long time... long before Yahoo or even Google were in the picture. Ignorant comments made here suggest somehow Yahoo is following.

I say Yahoo is leading by embracing a project that by definition is bigger than themselves. Good for them.

Share
twitter facebook
New and Radical (Score:4, Funny)

by Corydon76 ( 46817 ) writes: on Monday October 03, 2005 @06:46PM (#13708160) Homepage

Hey, wow, that is completely original [gutenberg.org]. Nobody else could have possibly thought [promo.net] of this idea before [wikipedia.org].

Share
twitter facebook
- More like... (Score:2)
  
  by Grendel Drago ( 41496 ) writes:
  
  It's more like the Million Books [archive.org] project. Project Gutenberg does a lot more than just scan the books; they proofread and post-produce them.
A DRM-free e-Ink e-book reader on the horizon (Score:2)

by Catbeller ( 118204 ) writes:

Noticed on boingboing.net that a Chinese company is marketing a DRM-free version of an ebook reader [boingboing.net] using an eInk screen.

Although I don't think it's on sale, it is the Holy EBook Reader Grail we've been seeking for ten years.

If we're gonna download ebooks, we should have a reader to read them with, no?
- Re:Why PDF? (Score:5, Informative)
  
  by david duncan scott ( 206421 ) writes: on Monday October 03, 2005 @05:25PM (#13707653)
  
  10 years down the road when everything is in PDF format, whose to stop them from charging us to view material in their format?
  
  The fact that it's an open, documented [adobe.com] format?
  Adobe has made their money the old-fashioned way, by making tools that work well, rather than by locking people into a format. GhostScript, among others, will read those PDF's with or without Adobe.
  
  Parent Share
  twitter facebook
  - Re:Why PDF? (Score:2)
    
    by amliebsch ( 724858 ) writes:
    
    Parent's point is still valid. PDF-related technology is patented, and the free licenses they currently grant are not to my knowledge perpetual. Therefore, theoretically, the license could be revoked, and while Ghostscript would still technically be able to read (old) PDF files, it could not do so legally. There are lots of open, documented formats that are still pay-to-play.
- Re:Why PDF? (Score:2)
  
  by TTK Ciar ( 698795 ) writes:
  
  You're right, sorta. The djvu [freshmeat.net] format is better than PDF for scanned books in most respects. Looks better, compresses better (and compresses by default), decompresses + renders faster while using less memory, more easily transformed to/from other formats due to availability of high-quality open source and free tools, etc. The Internet Archive's books collection has several books archived in djvu format.
  The downside is that most users do not have a djvu reader installed on their computers, and even thoug
- - - Re:Dupe (Score:4, Funny)
      
      by Nuttles1 ( 578165 ) writes: on Monday October 03, 2005 @05:47PM (#13707793)
      
      You must not be a true /.er because you know that if you were you would read up on every bit of documentation about anything that we do....Like how we alway RTFA...errr....wait, scratch that
      
      Parent Share
      twitter facebook
- If Google rose to the competition (Score:2)
  
  by expro ( 597113 ) writes:
  
  If there was ever anything we need competition in, it is search engines. Whether project Gutenberg needed any competition is another question.
  
  I don't see a lot of similarity between this project and the one Google is doing. Open versus proprietary. Free (free as in speech) information versus non-free information.
  In the case of other search engines Google has put out of business (Altavista, although the web site still exists, no longer exists as the more-advanced search engine it was using the facilitie
  - Oh, it's not quite the same as PG. (Score:2)
    
    by Grendel Drago ( 41496 ) writes:
    
    Project Gutenberg doesn't just scan books. Actually, they take a lot of their scans from outside sources, like the Internet Archive's Million Book Project. The work that PG does is largely in proofreading and essentially re-typesetting the book. The output of Yahoo!'s work here will be scads of page images, maybe with dicey OCR. The output of PG's work are plaintext (and sometimes HTML) ebooks.
    
    As a fan of Project Gutenberg, I look forward to more page images being made available, since it means more high-qu
- Re:its to see... (Score:4, Insightful)
  
  by twiddlingbits ( 707452 ) writes: on Monday October 03, 2005 @06:05PM (#13707900)
  
  PDFs of "public domain" or donated works will always be available. Amazon has gotten enough sh*t about the excerpts that they publish to entice the reader to buy the book. Google "e-book" and you'll see Yahoo! is nowhere near the only source. There is even an open-source e-book idea at Open eBook - http://www.openebook.org/ [openebook.org] -- Information on the publication specification for electronic books that will allow compatibility between different e-book devices.
  
  I just wonder how Yahoo! will make $$$ of this very small market of public domain works, or if they DO get repro rights to other books what the price model is to download them, or will you just see advertisements in your e-books? The authors are not going to give up their $$$ nor is Yahoo so somebody is going to have to pay for this content.
  
  Parent Share
  twitter facebook
- PDF Isn't Proprietary (Score:2)
  
  by everphilski ( 877346 ) writes:
  
  duuuuuhhhhh....
  
  -everphilski-
  - Re:PDF Isn't Proprietary (Score:3)
    
    by amliebsch ( 724858 ) writes:
    
    Yes, it is. [adobe.com]
- Ever hear of a printer? (Score:2)
  
  by B4RSK ( 626870 ) writes:
  
  There is this great new invention called a printer! You can use it to turn on-screen documents into printed hard copy!
  
  But wait, there's more! There are even ones that will print on both sides of the paper, and will automatically print two pages onto one side! So you can get 4 pages onto one A4 sheet, thus having text about the same size as a paperback! Put a couple of binding clips on one side and you have an instant book.
  
  More seriously though... Besides the fact that it is both cheaper and nearly insta
  - - Re:i have heard of these "printer" inventions, yes (Score:3, Interesting)
      
      by B4RSK ( 626870 ) writes:
      
      I do see your points as well, and definitely there will be demand for commercially produced books for some time to come.
      
      However, what I described does not require any folding and binding takes all of about 10 seconds. I've done this more than a few times and it does work out well.
      
      I have a Brother laser printer that cost about US$300. I bought this printer for other reasons, but it is a great book printer too. (Has a duplexer, supports both PCL6 and PS3, built-in standard 10/100 LAN port. Basically it wi
      - Condescending? (Score:2)
        
        by B4RSK ( 626870 ) writes:
        
        My first reply poked some fun, but I don't think I have been condescending.
        
        I'm not sure where you are getting the $3 books from unless they are used, "stripped", review copies, or unauthorized print runs. In any of those cases the author is not getting a cut. "Lonesome Dove" is $7.99 on Amazon + shipping.
        
        I support authors as well -- I certainly buy (more than) my share of books. I print some too though, mostly because of the need to get the information immediately. If the book is particularly good and I

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

RIAA Problems Solved (Score:5, Funny)

Re:RIAA Problems Solved (Score:2)

Re:RIAA Problems Solved (Score:2)

Yahoo seaches for Creative Commons (Score:2)

Re:RIAA Problems Solved (Score:2)

Re:RIAA Problems Solved (Score:2)

More expensive books? (Score:3, Interesting)

Will Yahoo scan it like they have yahoo.com? (Score:5, Funny)

Re:Will Yahoo scan it like they have yahoo.com? (Score:2)

no mention of project gutenberg (Score:4, Insightful)

Right you are! See TEI. (Score:3, Interesting)

What a concept. (Score:5, Informative)

Re:Different than Gutenburg (Score:2)

Gutenberg is more than book-scanning. (Score:2)

What do these guys know... (Score:5, Interesting)

Re:What do these guys know... (Score:2)

Re:What do these guys know... (Score:2)

Project Gutenberg (Score:5, Informative)

Re:Project Gutenberg (Score:5, Interesting)

Re:Project Gutenberg (Score:5, Informative)

Awesome, indeed! (Score:3, Interesting)

Re:Project Gutenberg (Score:2)

Re:Project Gutenberg (Score:2, Funny)

best format? (Score:3, Interesting)

Different scope. (Score:2)

Re:Project Gutenberg (Score:2, Troll)

Re:Project Gutenberg (Score:4, Interesting)

File format issues (Score:2)

Re:File format issues (Score:2)

Re:Project Gutenberg (Score:2)

Re:Project Gutenberg (Score:2)

Re:Project Gutenberg (Michael Hart essay) (Score:3, Informative)

Whew! (Score:5, Interesting)

But will they digitize PD works from after 1922? (Score:5, Informative)

Re:But will they digitize PD works from after 1922 (Score:2)

Re:But will they digitize PD works from after 1922 (Score:2)

Re:But will they digitize PD works from after 1922 (Score:3, Informative)

You're in luck! (Score:2)

Not really an up-stage (Score:4, Informative)

Re:Not really an up-stage (Score:2)

Re:Not really an up-stage (Score:2)

Re:Not really an up-stage (Score:2)

Re:Not really an up-stage (Score:2)

What about China? (Score:4, Interesting)

Re:What about China? (Score:2)

The difference between Google and Yahoo's effort (Score:5, Insightful)

Companies should Get Original (Score:2, Insightful)

NOT competing (Score:5, Informative)

Apples and Oranges! This is not Google Print! (Score:5, Informative)

Re:Apples and Oranges! This is not Google Print! (Score:2)

Sad thing about Yahoo though (Score:3, Interesting)

Re:Sad thing about Yahoo though (Score:2)

Annoying (Score:2)

Re:Annoying (Score:3, Insightful)

Re:Annoying (Score:2)

Re:Annoying (Score:4, Informative)

Yikes, How long ... (Score:2)

Re:Yikes, How long ... (Score:2)

University of Calif: Yahoo OK, Guttenburg banned (Score:5, Interesting)

Re:University of Calif: Yahoo OK, Guttenburg banne (Score:2)

Erosion of Public Domain--not just Disney and RIAA (Score:3, Informative)

Re:Erosion of Public Domain--not just Disney and R (Score:2)

Dumb question... (Score:2)

Physical owner of PD book controls its use (Score:2)

Re:University of Calif: Yahoo OK, Guttenburg banne (Score:3, Interesting)

University of California locks away public domain (Score:3, Interesting)

Re:University of California locks away public doma (Score:2)

Re:University of Calif: Yahoo OK, Guttenburg banne (Score:2)

Reading Between the Lines (Score:2, Redundant)

PDF?! yuck (Score:2, Insightful)

Re:PDF?! yuck (Score:5, Informative)

Bookripper on its way? (Score:5, Interesting)

Re:Bookripper on its way? (Score:3, Informative)

Re:Bookripper on its way? (Score:2)

Re:Bookripper on its way? (Score:3, Informative)

"Do no Evil" done right (Score:5, Insightful)

Re:"Do no Evil" done right (Score:2, Insightful)

Re:"Do no Evil" done right (Score:3, Informative)

Re:"Do no Evil" done right (Score:2)

Re:"Do no Evil" done right (Score:2)