Google To Digitize, Make Available British Library's Historical Holdings 86
pbahra writes with part of an excellent story at the WSJ: "The British Library today announced its first partnership with Google, under which Google will digitize 250,000 items from the library's vast collection of work produced between 1700-1870. The Library, the only British institution that automatically receives a copy of every book and periodical to go on sale in the United Kingdom and Ireland, joins around 40 libraries worldwide in allowing Google to digitize part of its collection and make it freely available and searchable online, at books.google.co.uk and the British Library website, www.bl.uk. ... As well as published books, the 1700-1870 collection will also contain pamphlets and periodicals from across Europe. This was a period of political and technological turmoil, covering much of the Industrial Revolution, the French Revolution, the introduction of UK income tax and the invention of the telegraph and railway. All of these topics are covered, as are the quirkier matters of the day, such as the account, from 1775, of a stuffed hippopotamus owned by the Prince of Orange."
Re: (Score:2)
Re: (Score:2)
No, no ... in terms of cricket pitches.
Or, in multiples of 'Playing fields of Eton'
But the IMPORTANT question is... (Score:3, Funny)
Re: (Score:3)
Speaking at the official launch, Kristian Jensen, the Library’s head of Arts and Humanities, said: “This process allows books to fulfill their original aim of being useful to as many people as possible.”
I thought that is already understood: the copyright should be extended forever, for the profit of the grand-grand-...-grand children of the author (too bad if the author sold the rights to the publisher... but it's irrelevant for the usefulness of books, isn't it?).
Besides, digitization comes with the risk of exposing these "as many" to words, facts and attitudes that are quite sensitive today. I hope that Goog
Re: (Score:2)
Re:But the IMPORTANT question is... (Score:4, Informative)
Here is a talk by librarian Brewster Kahle on book archiving [ted.com]. He created the Internet Archive internet.org.
With Google, its important to make a contract so that the content is really open to all.
Don't do drugs (Score:2)
Here is a tip: Don't do drugs before you post rants on slashdot.
Re: (Score:2)
Again: can we let the Tea Party and Michele Bachmann [nowpublic.com] be hurt if indiscriminate digitized papers of the time showed that the founding fathers did own slaves (and, possibly, more than own [monticello.org])?
It will work more to their benefit. Books during those times were filled with Cristian fanaticism and bigotries. In fact. It will be hard to tell Bachmann, and quotes from any t.b/rep, from the books.
He who controls the past (Score:1)
Controls the future..
great! (Score:1)
No doubt there'll be plenty of "ZOMG GOOGLE IS TAKING OVER" comments but this is brilliant. There's so much archived information in Britain that is supposedly public but actually costs a fortune to research as you have to travel to wherever it's stored then pay an archivist to take you into the vault and find the papers etc.
Re: (Score:2)
Absolutely, I'm delighted at this. On a much tinier scale I've been poring over the reefs of old notes I have to create the rpg in the sig there, I wish someone would offer to digitise that lot for me :/ I keep catching myself looking for the search button in the paper notebooks.
Re:I wish someone would offer to digitise that lot (Score:3)
Calling your bluff. What state are you in?
For that to happen for free you need to declare the contents of your game system Creative Commons BY-SA which is Attribution-ShareAlike, and avoids the weird tangles regarding ad revenue vs "non commercial".
Then you have to develop the Literacy Pyramid, which is what every single copyright-clueless entity always falls into, proving that they are about the lawyers instead of the writers. The Literacy Pyramid says that you need a base of some 100 Lurkers to get about
Re: (Score:2)
I genuinely have no idea what you're talking about?
Re: (Score:2)
Oh okay I think I get it - the game will be free for all to use and share upon publication, that's in the blog, issue 1. That's not much help though, I've yet to meet the ocr program that can translate my scribbles.
Re: (Score:2, Interesting)
Older documents and books are notoriously difficult to scan - as it gets old, the paper starts to disintegrate and the ink fades away, and because the books are valuable, people have to be much more careful how they open and handle them.
Bottom line is that old books need to be scanned at much higher resolution AND the blotches and broken characters have to c
Re: (Score:2)
Moreover, it's a necessity. If the scans are shit, then they can't be OCR'd, so all you have is pictures anyway.
It *can* be done, lots of libraries around the world have done proof of concept pilot runs going back to the 90s even, and you can find their collections on the web if you look.
Re: (Score:1)
Re: (Score:3)
Now I am intrigued... (Score:2)
What about the Prince of Orange and a stuffed hippopotamus?
Inquiring minds want to know.
What does one do with a stuffed hippo?
Re: (Score:3)
More to the point, what did the Princes of Green, Red, White and Mauve think? And what about the Marquis of Heliotrope?
Re: (Score:2)
I don't know, but the Fresh Prince was jiggy with it, and Prince didn't have The Time to comment about it.
Re: (Score:3, Informative)
Re:Now I am intrigued... (Score:5, Informative)
Indeed, and the title is older than the English word "orange" itself. This was introduced to English in the early 1500's (just in time for Shakespeare to complain its lack of rhyme...), and is termed after the name for the fruit. Prior to this, the colour was "geoluhread" (yellow-red). Note, we don't call it "carrot", as (yellow-red) carrots were developed in the 1700s.
Now, the house of Orange comes from the city, originally "Arausio", in southern France. This was named for the local Celtic water God of the same name.
Being Irish, I admit I find it somewhat ironic that the "Orange-men" are originally termed for a pagan, Celtic god...
Re: (Score:2)
Note, we don't call it "carrot", as (yellow-red) carrots were developed in the 1700s.
and popularized as a symbol of dutch patriotism, iirc
Re: (Score:2)
Now, the house of Orange comes from the city, originally "Arausio", in southern France. This was named for the local Celtic water God of the same name.
Thanks for pointing that out. I looked at it and thought, Arausio was a Gaul camp. Now to figure out why the Celts where in southern Gaul during a period of time when most everyone was trying to get way from the Romans.
Re: (Score:2)
Being Irish, I admit I find it somewhat ironic that the "Orange-men" are originally termed for a pagan, Celtic god...
But that's entirelly irrelevant to the current use of the term, which relates purely to the time after William of Orange, it has no connection with the original Celtic god.
You might as well say it is ironic that Christians worship on a Sunday, which is named after the ancientt Sun god.
Re: (Score:1)
But that's entirelly irrelevant to the current use of the term, which relates purely to the time after William of Orange, it has no connection with the original Celtic god.
You might as well say it is ironic that Christians worship on a Sunday, which is named after the ancientt Sun god.
Begging your pardon (and ignoring the conflation of Christ with Sun gods in early Romano-Christian history); I think the comparison might be more apt if a group of Christians worshipped on Thursday, a day named after Thor, so named themselves Thursians.
Personally, I would find that ironic - perhaps it's that extra step of actually naming yourself after the deity.
However, your mileage may vary.
On a related note, I find it somewhat amusing that many Christians (in my experience) would term saying "Christ" as
Re: (Score:2)
Ride it around lashing it with a switch of course. Ah the joys of inbreeding.
Re: (Score:2)
Stuffed Hippopotamus? Is that 1700 goatse?
Not the only one... (Score:4, Informative)
This is not the only British library that gets all publications, The National Library of Wales (http://www.llgc.org.uk/) also gets all publications that are published in the UK (and there is likely one also in Scotland)
Re: (Score:2)
Re: (Score:1)
As usual, it's slightly more complicated: http://www.legaldeposit.org.uk/background.html [legaldeposit.org.uk]
Re:Not the only one... (Score:4, Interesting)
Interesting, as it's covered by law in the UK. I wonder how it would apply to self-published books, such as books sold through the likes of Blurb or Lulu.
Those companies are not UK based, so are not covered by the legislation. However, if I (as a UK resident) published a book, for sale to the public, via Lulu, would I be classed as publisher in terms of this legislation?
Legal deposit (Score:2)
Legal deposit cover printed material, digital publications (Newspapers, scholarly journals, software including games) and online material are covered by a voluntary scheme.
Re: (Score:2)
Coming back to this late, but Lulu and Blurb are basically print on demand services, so we're not talking digital books. Lulu even let you get an ISBN number for your book.
Re:Not the only one... (Score:5, Informative)
Actually the BL really is the only one to automatically get all publications. Five other libraries are entitled to a free copy upon request.
http://en.wikipedia.org/wiki/Legal_deposit#United_Kingdom [wikipedia.org]
I know Cambridge gets everything with an ISBN, and from your post it sounds like Wales and Scotland do too. Things like PhD thesis only go to the BL though.
Re: (Score:2)
Strange, mine just went to the BL. Perhaps it depends upon the examining institution.
Re: (Score:2)
Things like PhD thesis only go to the BL though.
No, at the National Library for Wales we get the theses from the universites in Wales:
http://www.llgc.org.uk/index.php?id=4653 [llgc.org.uk]
So they don't get everything from the UK (I'm not sure what Scotland does, they have their own National Library).
We've started harvesting e-theses from university repositories as part of the ETHOS project (see link in the url above), the BL will however harvest them on from us (subject to agreement with the originating uni), so they'll
Its worth pointing out... (Score:2)
From the article:
Google are approaching it correctly this time.
Re: (Score:1)
Will the digitized copies contain a 'copyright Google' watermark?
Finally, us mere mortals may have a glimpse (Score:1)
The BL blows on about adding to "our shared heritage" but the truth is that they are notoriously fickle and arbitrary about issuing Reader's Passes to actually use their collection.
I have had my application for a pass refused as my research justification was deemed "insufficiently scholarly", even after I had spent 10 minutes being interviewed by the secretary. The average man on the street who wanders in to their London campus will be in for a rude shock.
Even if the staff judge you to be worthy enough to
Re: (Score:2)
Considering the items involved that require you to have a readers pass, yes of course it is difficult - they are one of a kind items, often needing to be handled in specific ways and treated with extreme respect, costing millions of pounds to restore, thousands of pounds to store and cannot be replaced. They are exactly the items that need a gate keeper to look after them.
Re: (Score:1)
ALL items in the British Library require a Reader's Pass to view, except for the limited stock that they retain for inter-library loan.
This is regardless of their provenance or rarity.
Re: (Score:2)
The BL blows on about adding to "our shared heritage" but the truth is that they are notoriously fickle and arbitrary about issuing Reader's Passes to actually use their collection.
It's automatic if you are doing a postgraduate degree.
I have had my application for a pass refused as my research justification was deemed "insufficiently scholarly", even after I had spent 10 minutes being interviewed by the secretary. The average man on the street who wanders in to their London campus will be in for a rude shock.
You don't accept the possibility that your research justifiction might have been insufficiently scholarly?
Even if the staff judge you to be worthy enough to view their precious possessions you have to jump through hoops just to reserve the item.
You ask the person on the information desk to reserve it for you, or you log in to the electronic catalogue (on-site or on-line), look the item up, press the "reserve" button, and select the reading room to which you want it delivered. If you consider that to be jumping through hoops then it says a lot for the academic standard you are likely to achi
Re: (Score:2)
Hi there,
Do you hold a BL Reader Pass? Actually they're also now available to undergraduates, but since I am 20 years out of Uni that's not much help to me either
> You don't accept the possibility that your research justifiction might have been insufficiently scholarly?
"A history of astro-navigation" may not be Earth-shatteringly exciting, but who are the BL to judge its merit? I had a case for research work, I showed that pamphlets they held were not available elsewhere but my application was denied
Re: (Score:2)
Hi there,
Do you hold a BL Reader Pass?
Yes.
Actually they're also now available to undergraduates, but since I am 20 years out of Uni that's not much help to me either
They're available to anybody who can make the case for one, irrespective of study level. It's just that doing postgrad studies is one of the objective criteria that automatically makes the case.
A history of astro-navigation" may not be Earth-shatteringly exciting, but who are the BL to judge its merit?
They are the people appointed with the task of making that judgement.
I had a case for research work, I showed that pamphlets they held were not available elsewhere but my application was denied for no reason other than the secretary was grumpy that day. She could provide no objective explanation.
In other words, you failed to make the case and it's somebody else's fault. There is a set of objective criteria to decide whether somebody can get a card. If you fail those tests then you get a second chance with an interview and a subjective j
Re: (Score:2)
That's strange. The Library of Congress gives readers passes out to most anybody who applies.
Re: (Score:2)
I will happily flout the Legal Deposit Libraries Act and refuse to provide BL a copy.
What with that and your user name, you're on two strikes. Just as well you're not in the US, or the next time you crossed the road owithout lokking properly you'd be off to prison for thirty years.
Re: (Score:2)
This has been pointed out and proven wrong a dozen times already in the comments. Only the British Library gets one automatically, the other libraries may request a free copy.
Re: (Score:1)
Re: (Score:2)
This has already been answered by several people who either knew or could be bothered (like me) to spend ten seconds on Google.
Re: (Score:3)
We had a mix of temps and perms, mostly temp scanner operators and perm developers.
Professionals - yes, there were clauses in the contract about how much we paid if things were damaged.
Team size? Smaller than you might think - we had about ten at its peak. Around the clock - not quite, but there were definitely early and late shifts.
We used then-flash Bell
Re: (Score:2)
I worked at company that did the same for the French National Library, about fifteen to eighteen years ago. To go through your questions: ...
Actual process was to guillotine the books and feed them through the scanners, some books would then be restitched. In the case of rare books we'd photograph them instead (and then scan the photo - this predates digital cameras).
I thought that Google had tech that could scan the pages of an original book and automatically compensate for any curvature. IIRC** it did something like flash a test pattern onto the page to determine how to straighten the final image.
**but it was a while ago I read this so could easily be mistaken.
Re: (Score:2)
We did that too - the Kofax card and driver software could take care of deskewing and it did a reasonably good job. Again, this was a while ago so I imagine things have improved but it wasn't too bad.
Cheers,
Ian
Re: (Score:2)
I've been involved with a similar project in the Netherlands. We found that commercial OCR engines had a high error rate on these old documents. We ended up having each document OCR'ed twice: once by software, once by having a sweatshop in India manually type up the document. The Indians had a lower error rate than the OCR software. By combining the two sources we could achieve an error rate low enough to comply with the project spec.
The project was unusual in that the documents were an index (of the minute
Re: (Score:2)
Mod this up, interesting discussion.
I'd guess the answer to 1 and 2 is "it depends." There must be rarities for which a full-on expert is required with white gloves and a wand (and in their spare time they supplement their income as street magicians.)
The proofreading is at least partly through reCAPTCHA. "Currently, we are helping to digitize old editions of the New York Times and books from Google Books." http://www.google.com/recaptcha/learnmore [google.com]
Re: (Score:2)
Re: (Score:2)
Do they use automated machines, scanning beds, or wands?
No, they're transcribing everything by hand using quill pens and ink, then typesetting it on proper hot metal presses, then finally photographing each page with an 11 x 14 plate camera and emailing the images one page at a time to everyone who has a gmail account.
This is going to be incredibly great (Score:4, Insightful)
The 18th century saw the birth of both the Industrial Age [wikipedia.org] and the Age of Enlightenment [wikipedia.org]. This was a time of profound change on a global scale that easily rivals the impact of our own information age.
You may ask what is the point in studying history -- who cares about the impact of steam power, for example? Here's the thing: although technology improves over time, people basically remain the same. By understanding the dislocation of farmers to factories in 1750, you can gain insight into the dislocation of national workers to global workers today.
To get access to literally every single published work from this period is going to be amazing. Bravo UK and Google!
Re: (Score:2)
people basically remain the same
Well, yeah, but they smell a lot better now.
Re: (Score:2)
You may ask what is the point in studying history
It is entertaining :) http://en.wikipedia.org/wiki/Connections_(TV_series) [wikipedia.org]
Future Libraries? (Score:1)
Re: (Score:2)
I wonder what they will look like... If someone hasn't thought of it before, someone should start drawing up plans for futuristic libraries where instead of checking out paper books you can check out books for your kindle or some other device... on top of that, I think it would be cool for it to look like a traditional library, but server racks instead of bookshelves.. (this probably just seems cool to me because I'm a nerd, I have a lot of friends who are 'conservative' when it comes to paper books.. A lot of the English majors I know treat technology like the anti-christ.
I think your electronic library will look like this [google.com] and the server racks will be located somewhere with cheap power and air conditioning.
Out-of-copyright works (Score:2)
Re: (Score:2)
Possibly. In the US, the Bridgeman V Corel case decided that copies of public domain works are not copyrightable, but that of course has no bearing in the UK. There is a sense there that the ruling is reasonable, but straight up copies are definitely deemed copyrighted works thanks to (imho, inane) concepts like lighting and photogenicity. In this case, nobody's likely to complain, and surely not Google, but image copyright in the UK lies in the act of taking the photo and not generally in the creativity
let's see what's actually happening (Score:2)
The British Library has just handed the copyright on a load of uncopyrighted work to Google, and Google in return gets exclusive commercial rights to the work. This is awful. And for only £6 million, by their estimate, they could have done it themselves - considering the broad range of interested parties, donations could easily raise that amount. Their effort would be far better, too, if the standards of Google's old archives are anything to go by.
This is just another example of the British "public pr
Re: (Score:2)
No, they didn't hand over any copyrights, it even states that all digital rights revert back to the Library. Google already has the expertise to scan these books and the infrastructure to distribute them.
Basically they saved tax payers £6 million plus whatever the hosting and distribution costs would be AND the books are now easily accessible to anyone in the world! Do you really think the British Library could have done a better job than Google in house?