Slashdot Log In
Open Library Project Takes Flight
Posted by
ScuttleMonkey
on Mon Jul 16, 2007 05:34 PM
from the alexandria-green-with-envy dept.
from the alexandria-green-with-envy dept.
Aaron Swartz today announced the launch of the new Open Library project. The goal of the project is to produce the world's greatest library on the Internet free for anyone to use. Starting with the Internet Archive's book scanning project and organizing the insertion of new content via a wiki-type model the project seems to be off to a great start. The demo, source code, and mailing lists were all opened up today in hopes of drawing interest from the public at large.
Related Stories
[+]
News: Open Library Goes Online With Public Domain Books 103 comments
mrcgran writes "A competitor to Google Book Search emerges as the Yahoo-backed Open Content Alliance launches an 'open library' of its own. After several years of scanning and archiving, the Internet Archive and the Open Content Alliance this week unveiled the Open Library, their attempt at bringing public domain books to the masses. The Internet Archive has hosted texts for quite some time, but the Open Library makes fully-searchable, high-quality scans of books available, along with downloadable PDFs. It offers an experience designed to match paper: there's even a page-flipping animation as readers move forward and backward through the book. Ben Vershbow of the Institute for the Future of the Book says that when it comes to presentation, 'they already have Google beat, even with recent upgrades to the [Google Book Search] system including a plain text viewing option.'" We have previously discussed this project, though this is a bit more complete rundown on the initiative.
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Awesome (Score:2, Funny)
Re: (Score:3, Insightful)
Go Disney.
Re:Awesome (Score:4, Insightful)
C'mon, I would be fairly disappointed with a library of 21,000 real books even if it contained only fiction from random authors from 1900-2000. Gutenberg doesn't even have that much depth.
That's not to take anything away from them. But to make claims about it being a good selection based on "21,000 - gee that's a big number" is a bit ludicrous.
Parent
Re: (Score:2)
Re: (Score:3, Informative)
In response to your question: (Score:4, Interesting)
Re:In response to your question: (Score:5, Insightful)
Parent
Re: (Score:3, Interesting)
Anyways, the good news is that libraries do exist, and aren't going away. If the electronic library is to exist, it should be pursued as an extension of existing libraries. In other words, we must ensure that electronic access to text grows out of the familiar library setting, not Napster. There are lots of ways to do this.
For instance, current library filing systems are really jus
Re:In response to your question: (Score:5, Insightful)
No, of course not, because they're protected by copyright law, which in turn grew out of article 1, section 8 of the constitution. Just there will never be a restriction on keeping and bearing arms... uh, oh, wait. OK then, like there will never be restrictions on speech... no, no, turns out there are plenty of those. Mmmm, ok, just like the feds can only take action on interstate commerce, because you know, that's an enumerated power they can't step outside... aw, no, they do that all the time. Well, it'll be like how they can't do searches or seizures without probable cause, oath or affirmation, and a warrant... oh... I guess that's no longer true. Well, of course they can't make ex post facto laws... except for the ones they've made, that is, you know, thinking of the children and such.
Wait. Why is it again libraries "aren't going away?"
Parent
Re: (Score:3, Insightful)
Aside from the already mentioned fact that all books aren't digitized, it may be because Internet access is not universal, the barrier to access is still high (computers aren't free, right?) and one of the few places that you can get free access and access to a device to do it is, of course, a library.
Re: (Score:2)
The truth of the book publishing business today is that the American public, on the whole, just doesn't read very much. Libraries, on the other hand, stock books -- multiple copies of books, in many cases. And there are thousands of libraries in America.
How do they get all those books? They buy them.
Each year, public libraries buy thousands and thousands of books -- books that individual readers aren't b
Re: (Score:2)
Re: (Score:2)
Libraries don't get sued for infringement (Score:2, Interesting)
If an electronic library can find a way to obtain support as a literacy project, there are plenty of traditional avenues open. Suits against council literacy efforts don't go down well, at least in Europe.
Re: (Score:2)
Of course not, because they've paid for their copies. Makes a difference, doncha know?
Re: (Score:2)
There is no such word as "doncha," and if there is, there shouldn't be. It's "don't you." Two words, not one.
Re: (Score:2)
It's interesting you should note that. I would like to point out that it's actually three words: "do not you". The word "don't" is a contraction of "do" and "not", which has somehow found its way into spelling as well as in verbal usage.
The word "doncha" is common enough that I, a man who does not live in an english-speaking country, and does not have english as my first language, has been expose
Re: (Score:2)
Now, it is up to you to document that my post wasn't sarcasm, if you are to claim my post was sarcastic.
Re: (Score:2)
Doncha wish your words had no apostrophe?
Doncha?
Doncha baby, doncha?
Not a problem for Pirate Bay? (Score:2)
Or does it only apply to stealing popular movies and music?
Project Gutenburg (Score:2)
It's been around for years, and I thought it was pretty well-known.
Re:Project Gutenburg (Score:5, Informative)
Parent
Re: (Score:3, Interesting)
Also, ma
Re:Project Gutenburg (Score:5, Insightful)
Your issue is more likely that there are a lot of crappily designed webpages out there.
If you're reading "large swaths of ordinary onscreen text", do this:
- Copy-paste in into any word processor
- Choose a nice, big font. (Small is good for UI, not for 400-page-novels.)
- Use a dark background. A page reflects light, a screen projects it. You do not want glaring white.
- Use 8-10 words per line.
- Profit! Err... less mental exhaustation, at least.
Pay extra attention to words per line. It's a key reason onscreen text is often hard to read. Too many words per line, and you'll have a mental overhead every few seconds trying to figure out which line you just read and which is next. Basically, books do it right and you want to display onscreen text at a similar width. Scrolling is easy these days, and wide lines is a remnant from when computers required a click-and-drag to scroll.
Wide books and newspapers are divided into columns. There is a reason for doing this, but almost nobody seemed to think about that when they display text on screens.
Heck, even slashdot defaults to a glaring white background and text stretched all over my 1920 pixels. Go figure.
Parent
Re: (Score:2)
Wide books and newspapers are divided into columns. There is a reason for doing this, but almost nobody seemed to think about that when they display text on screens.
For me, quite the coincidence to run across you comment. Just in the past few days I have taken to resizing my browser to half the width of the screen - like folding a newspaper - because I realized that my eyes tire when reading lines of text running the entire 1280 pixel width of my monitor. It seems to work out great - I am even reading S
Re: (Score:2)
In the case of
Of course, this isn't a very good solution for browsing because it seems to remove the style sheet every time the page changes and
Re:Project Gutenburg (Score:4, Interesting)
Older books are often hard to relate to without some context, and that sort of thing is what makes or breaks many editions of the "classics", IMO. If, when shopping for books, I pick up a copy of a book that was written more than 200 years or so ago, and it has no foot notes, most of the time I won't buy it. This is doubly true of translated works.
Wikipedia can usually stand in for an introduction, but there's nothing like footnotes to get you closer to an older text, and nothing that I know of provides that. If someone started a project to provide that kind of information for Project Gutenberg books, I'd get on board to help. Bonus points if they're also putting them in formats that don't suck (making plain text look good on the screen is a pain in the ass).
I'd start it up myself, but alas, I am poor (college). I'd definitely help out if someone else got it going, though.
Until someone does that, PG is practically useless to me.
Will this project do anything like that, or do you know of anyone who's doing this?
It seems to me that 500-1,000 really well-edited, footnoted, and formatted free books are better than 21,000 books worth of plain-text barf.
Parent
question... (Score:2)
http://www.gutenberg.org/wiki/Main_Page [gutenberg.org]
they have a great collection of ebooks online already and your free to grab and share them. I wish that they would have the base for this though in a country which doesn't have insanely long copyright laws, then it could really add value over gutenburg
Relevance? (Score:2, Insightful)
Re: (Score:2)
The Open Library is a database of books, which sometimes includes the full scanned text, and sometimes does not.
So if the same work was published a dozen different times, it would have an entry in The Open Library for each edition, and usually just one entry in Project Gutenberg assembled from all of the out-of-copyright printed editions.
wikipedia 2.0 (Score:2, Interesting)
Re: (Score:2, Informative)
Re: (Score:2)
Re: (Score:2)
Not Project Gutenbeg (Score:5, Insightful)
Re: (Score:2, Insightful)
I had a play with it and it is quite limited at the moment. I did manage to add a book, but there was minimal instruction on how to go about this, and uploading covers at the moment is not available (as far as I could determine in 5 minutes anyway).
Re: (Score:3, Informative)
The actual site ... (Score:2)
One thing new about it (Score:2)
<type 'exceptions.TypeError'> at
unbound method remove_node() must be called with LRU instance as first argument (got NoneType instance instead)Python
Web GET http://demo.openlibrary.org/search
Traceback (innermost first)
/1/pharos/code/production/pharos/infogami/ tdb/tdb.py in remove_node
...
node = LRU.remove_node(node)
▶ Local vars
A Library Card Tip (Score:2, Insightful)
Kinakuta (Score:3, Insightful)
Are there really any working data havens?
Vandalism controls? (Score:3, Interesting)
First thing I did on the site was pull up an entry for a book my university press publishes. It had no "Buy" option. I edited the metadata to add the ISBN-10 number for it, and voila, a Buy option.
It then took a certain amount of self-control for me not to go into various titles dealing with George W. Bush and enter the ISBN-10 of the storybook [amazon.com] containing "My Pet Goat". Purely as a proof of concept, you understand.
This is simply the Wikipedia vandalism problem writ large. What controls will OpenLibrary put in place to guard against it?
Some thoughts (Score:5, Insightful)
They should republish the raw data the same way Wikipedia and even IMDb does. I for one am not going to contribute to any data collection project that I can't later use myself.
Their schema [openlibrary.org] doesn't differentiate between editions. If I understand it right, that means that for the 3000 existing editions of "Tom Sawyer" released over the years, by different publishers in different countries and languages, the book's description has to be replicated for each one. That can't be good. I don't have a quick solution to this myself. Sometimes (esp. with tech books), a new edition changes content significantly compared to the previous one, sometimes they're exactly the same.
Collecting the cover images is a great service. However, doesn't this infringe on the publisher's copyright? Is this still fair use? What about countries like Germany without fair use laws--will German books still be OK because the data is collected in the USA (I guess)?
Add a feature to upload book descriptions as XML. Suggest a DTD. I have a list of my book collection stored as an XML file, so have others (maybe not natively, but book collection management software usually has an export function). It should be possible to automate the process of adding book information already stored in some digital format.
There should be some category system to pick from. Some may put Tom sawyer into "Novel, USA antebellum", others into "Novel, USA 19th century".
Somehow connect this to Wikipedia. The more prominent books have article pages. Maybe data could be retrieved from it as well. There are currently Tom Sawyer articles in 16 or so languages.
The edit page should group items better: stuff everyone understands (year published, title) first, then those things only specialists know.
The edit page's descriptors shouldn't be images but text which links to an explanation page for the same reason. BISAC? LCCN? UCC13? I know, I can find out what those are with a search engine, but I shouldn't have to.
Prepare for i18n. I guess LCCN is a library of congress code number? Those types of libraries exist in other countries, too. Each book can have a gazillion codes. Make this another tuple in the database: (book_id, code_id, code_value) instead of (book_id, lcc_id, isbn10, isbn13, 10 other codes in the same record).
Also i18n: store language codes with all textual columns. A description is most likely going to be Hungarian for a book published in Hungary in Hungarian.
This complicates the schema a lot. Having very few tables is tempting, but it usually doesn't work well with the real world.
Re: (Score:3, Insightful)
The kinds of skills necessary for doing actual cataloging work.... classifying and organizing knowledge... are so rare as to be a very precious jewel of a person if you ever do find somebody like that. And developing these skills is not something very easy to accomplish either.
Re: (Score:2)
Or just old, almost like James Joyce's work, which arguably nobody reads, but for Joyce at least, a lot of people talk about it.
And as for getting stuff...at least for now, the experience of an ebook is a lot less enjoyable to most people than that of a dead tree book. Dead tree books have portability advantages as well. So if someone likes a book they find on Open Library, they might well buy it on Amazon.
Mod parent up (Score:2, Informative)
The difference between this and other catalogs (Library of Congress, etc.) is that presumably you can customize it more.
Re: (Score:3, Interesting)
Re: (Score:2)
v : run away quickly; "He threw down his gun and fled" [syn: flee,
fly]
taked from the Gnome's "Dictionary look up" panel widget. I have no idea which dictionary(-ies) is/are being used by it.
Re:IPL? (Score:5, Interesting)
OpenLibrary is a lot more complete, for one .. searching on "Ogorkiewicz" in IPL yielded no hits, while OL gave me several. The Archive is well-connected to various institutions like the Library of Congress and Bibliotech, and is able to pull a lot of help from these other organizations into making a more complete service.
OpenLibrary is also a catalog of metadata, providing information for each book like physical format, publisher, ISBN#, number of pages, and so on. This metadata has a lot of holes for now, but hopefully that will change as publishers and/or people who own copies of these books fill in the blanks, much like the Internet Movie Database.
Finally, OpenLibrary has its own staff which is dedicated to working with Internet Archive partners to make this the most complete catalog on the planet. IPL is cool (I like it!) but it does not seem to be very actively maintained.
(disclaimer: I work for The Internet Archive, but I do not speak for it, and the OpenLibrary team is in a completely different department from mine so DO NOT treat this post as necessarily any more authorative or correct than any other slashdot post.)
-- TTK
Parent
Re: (Score:3, Informative)