Penguin Random House Underscores Copyright Protection in AI Rebuff (thebookseller.com) 40
The world's biggest trade publisher has changed the wording on its copyright pages to help protect authors' intellectual property from being used to train large language models and other artificial intelligence tools, The Bookseller has reported. From the report: Penguin Random House has amended its copyright wording across all imprints globally, confirming it will appear "in imprint pages across our markets." The new wording states: "No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems," and will be included in all new titles and any backlist titles that are reprinted.
The statement also "expressly reserves [the titles] from the text and data mining exception," in accordance with a European Parliament directive. The move specifically to ban the use of its titles by AI firms for the development of chatbots and other digital tools comes amid a slew of copyright infringement cases in the US and reports that large tranches of pirated books have already been used by tech companies to train AI tools. In 2024, several academic publishers including Taylor & Francis, Wiley and Sage have announced partnerships to license content to AI firms.
The statement also "expressly reserves [the titles] from the text and data mining exception," in accordance with a European Parliament directive. The move specifically to ban the use of its titles by AI firms for the development of chatbots and other digital tools comes amid a slew of copyright infringement cases in the US and reports that large tranches of pirated books have already been used by tech companies to train AI tools. In 2024, several academic publishers including Taylor & Francis, Wiley and Sage have announced partnerships to license content to AI firms.
Doesn't Mean Much (Score:3)
I doubt this will stop companies (or individuals) from doing it, though. They'll just do a better job at hiding it and making it harder to prove ever happened in the first place.
Re: (Score:3)
Add to it that there are loopholes in the copyright law protecting parodies.
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
and making it harder to prove ever happened in the first place
its easy to prove when your "AI" models training data contains all the other companies watermarks in the requested test sample.
Re: (Score:3)
Watermarks aren't foolproof or perfect. Just an increase in difficulty.
Re: Doesn't Mean Much (Score:5, Informative)
The EU copyright law has explicit language allowing the use of copyrighted materials for training AI. The only companies this could reach are US companies.
Agree (Score:3)
The corpus of public domain books, freely available for AI training, much better written, would be good starting point for training AI models on literature.
Even the penny dreadfulls, dime novels, pulp magazines are better written than much of the modern books.
Reality: We should push the copyright office, Congress to
limit copyright for written works on paper or electronic to 50 years from the earliest of
- date of first publication - revisions, author's cut, etc. do not extend copyright on the original work
- if unpublished, then 50 years from the youngest author's 35th birthday.
- works made for hire for magazines and corporations, 50 years from creation date
And require in all published works on the copyright page a statement "This book or literary work will become public domain in the USA in the year X at the latest. If copyright law or regulations changes to extend beyond year X, this book will be donated by the copyright holder at time of publication or its asignee to the public domain in year X."
Re: (Score:1)
- if unpublished, then 50 years from the youngest author's 35th birthday.
So if an 86 year old and an 87 year old write something together, they already lost copyright one year before it was written (since it was certainly unpublished when it was not yet written).
Re: Agree (Score:2)
Only if they never publish. In which case, tough luck.
For late in life publications by authors (Score:2)
Add in a 10 year copyright from the date of publication for authors older than 80 years old.
The main points being:
- Publishing your work requires you to legally state when the work enters the public domain. And state that if any law changes extend that copyright term, the work is freely donated to the public domain without any ability to retract such donation in the future
- Copyright exists from a limited time after publication and the duration of copyright is based on 85 years from the birthdate of the ol
Re: (Score:2)
My take is 5 years after the death of the last passed creator.
That would be enough to close the books for the creator and ensure they'll get a decent closure.
That would mean that at least some of the works of Prince could be free to use.
Much simpler solution (Score:2)
Re: (Score:2)
Re:Agree (Score:5, Insightful)
Authors don't create works if a payback is going to take more than 28 (or even 14) years, so longer terms do nothing to promote progress, in fact they impede progress. They hold our culture for ransom.
They can say that all they want (Score:4, Insightful)
This is a legal question. Whether or not it's allowed depends on the law, not on the message on the copyight page.
If the law restricts AI training, saying it's not allowed doesn't change that.
If the law doesn't restrict it, saying it's not allowed doesn't somehow change the law.
Re: (Score:2)
Re: (Score:2)
I wouldn't be so dismissive.
A copyright disclaimer holds a hell of a lot more water than something like a TOS or EULA. Copyright is something that is well tested and repeatedly upheld by the courts. And, in the US, it has the DMCA cudgel smashing everything too.
Re: (Score:2)
Argh good old USA, who spent best part of 200 years not giving a f*%k about other peoples copyright. The mind boggles as to why they would expect me to now respect their copyrights. When they apologize and pay reparations I will listen.
Re: They can say that all they want (Score:2)
Copyright disclaimers don't hold particular weight without a pre-existing interpretation to give them weight. There isn't one in this case, and it'll end up irrelevant anyway. Re-encode the material in a jurisdiction that simply doesn't recognize anything complicated and you're done.
They have no right to tell us what to do (Score:5, Insightful)
Re: They have no right to tell us what to do (Score:2)
So far, no case has been brought in the EU, because it exempts AI training from copyright. In the USA all cases I know of have been lost by the plaintiffs or delayed, or the loss has been appealed. My expectation is that this will continue to be the case. Even Republicans understand that the money of the future isn't made with copyright but with AI.
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Training an AI on books is copying. The software industry settled this ages ago, it's why we need a license to run a program. The act of transferring the program from your hard drive or disk to computer memory is legally considered copying it and thus you need a license to do so. The act of reading the book into an AI system for training is an act of copying. Full stop. Now, there are exceptions to copyright law which makes doing that without a license legal and the question is if training an AI counts
I've already been doing it too (Score:2)
Re: (Score:2)
You should be aware, that that makes the license non-free (just as the CC-NC licenses are).
I am also not sure if you are allowed to modify the CC license text. The licenses itself are copyrighted by their creators as well and you often not allowed to change them as this would be misleading if you still say it's CC licensed when your new clause takes away freedoms the CC license would grant.
Re: (Score:2)
Pretty much every organisation I know
Useless (Score:2)
The clause does not change much.
When it comes to copyright there are two options how it may work out in court:
1) AIs can learn under copyright exceptions. No license can change that, as the exceptions mean no license is needed for AI training, so all clauses in the licenses are irrelevant.
2) AI training is not allowed, e.g., because the copyright exceptions are found to be ineffective. Then current licenses would be enough to disallow AI training without clauses targeting AI in particular.
Prohibiting AI training is absurd. (Score:2)
Telling an AI it cannot be trained on a book's contents is tantamount to telling an aspiring author not to read a certain book to prevent the risk of that author imitating the book's style or content in his or her own future writings. By this logic the Tolkien Estate should sue George R. R. Martin, because it is certain that the latter drew inspiration (for profit!) from the works of the former.
Re: (Score:2)
Re: (Score:2)
I don't see where the prohibition says "but it's OK to train on the book's contents if you've purchased it legally", have I missed something?
Re: (Score:2)