Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI Technology

Penguin Random House Underscores Copyright Protection in AI Rebuff (thebookseller.com) 40

The world's biggest trade publisher has changed the wording on its copyright pages to help protect authors' intellectual property from being used to train large language models and other artificial intelligence tools, The Bookseller has reported. From the report: Penguin Random House has amended its copyright wording across all imprints globally, confirming it will appear "in imprint pages across our markets." The new wording states: "No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems," and will be included in all new titles and any backlist titles that are reprinted.

The statement also "expressly reserves [the titles] from the text and data mining exception," in accordance with a European Parliament directive. The move specifically to ban the use of its titles by AI firms for the development of chatbots and other digital tools comes amid a slew of copyright infringement cases in the US and reports that large tranches of pirated books have already been used by tech companies to train AI tools. In 2024, several academic publishers including Taylor & Francis, Wiley and Sage have announced partnerships to license content to AI firms.

This discussion has been archived. No new comments can be posted.

Penguin Random House Underscores Copyright Protection in AI Rebuff

Comments Filter:
  • by Bahbus ( 1180627 ) on Saturday October 19, 2024 @12:23AM (#64876515) Homepage

    I doubt this will stop companies (or individuals) from doing it, though. They'll just do a better job at hiding it and making it harder to prove ever happened in the first place.

  • by will4 ( 7250692 ) on Saturday October 19, 2024 @12:25AM (#64876519)

    The corpus of public domain books, freely available for AI training, much better written, would be good starting point for training AI models on literature.

    Even the penny dreadfulls, dime novels, pulp magazines are better written than much of the modern books.

    Reality: We should push the copyright office, Congress to
    limit copyright for written works on paper or electronic to 50 years from the earliest of

    - date of first publication - revisions, author's cut, etc. do not extend copyright on the original work
    - if unpublished, then 50 years from the youngest author's 35th birthday.
    - works made for hire for magazines and corporations, 50 years from creation date

    And require in all published works on the copyright page a statement "This book or literary work will become public domain in the USA in the year X at the latest. If copyright law or regulations changes to extend beyond year X, this book will be donated by the copyright holder at time of publication or its asignee to the public domain in year X."

    • by Anonymous Coward

      - if unpublished, then 50 years from the youngest author's 35th birthday.

      So if an 86 year old and an 87 year old write something together, they already lost copyright one year before it was written (since it was certainly unpublished when it was not yet written).

      • Only if they never publish. In which case, tough luck.

        • Add in a 10 year copyright from the date of publication for authors older than 80 years old.

          The main points being:

          - Publishing your work requires you to legally state when the work enters the public domain. And state that if any law changes extend that copyright term, the work is freely donated to the public domain without any ability to retract such donation in the future
          - Copyright exists from a limited time after publication and the duration of copyright is based on 85 years from the birthdate of the ol

    • by Z00L00K ( 682162 )

      My take is 5 years after the death of the last passed creator.

      That would be enough to close the books for the creator and ensure they'll get a decent closure.

      That would mean that at least some of the works of Prince could be free to use.

      • Make it work like a patent with a very limited term but requiring a whole, complete implementation being deposited along with the key tools needed to reproduce it. For recorded stage plays, movies and shows, this would include not only the original source footage but also things like the script and designs or replicas of key props. For literature, it would include authors notes and early drafts, as well as references where applicable for any research materials used, and yes, for software, that would include
    • Great if you want the resulting models to generate texts that sound just like your grandparents; languages evolve quite significantly from one generation to the next. The LLMs we have now were trained on the "average" of all generations which is why they tend to produce some rather strange & jarring combinations of words & expressions.
    • Re:Agree (Score:5, Insightful)

      by msauve ( 701917 ) on Saturday October 19, 2024 @07:07AM (#64876923)
      14 years, with one 14 year extension, as was originally implemented. The purpose of copyright is "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries."

      Authors don't create works if a payback is going to take more than 28 (or even 14) years, so longer terms do nothing to promote progress, in fact they impede progress. They hold our culture for ransom.
  • by Mononymous ( 6156676 ) on Saturday October 19, 2024 @02:03AM (#64876605)

    This is a legal question. Whether or not it's allowed depends on the law, not on the message on the copyight page.
    If the law restricts AI training, saying it's not allowed doesn't change that.
    If the law doesn't restrict it, saying it's not allowed doesn't somehow change the law.

    • They should reword it to be like a shrink-wrap license. If you don't agree, then return the book, etc.
    • by evanh ( 627108 )

      I wouldn't be so dismissive.
      A copyright disclaimer holds a hell of a lot more water than something like a TOS or EULA. Copyright is something that is well tested and repeatedly upheld by the courts. And, in the US, it has the DMCA cudgel smashing everything too.

      • by jabuzz ( 182671 )

        Argh good old USA, who spent best part of 200 years not giving a f*%k about other peoples copyright. The mind boggles as to why they would expect me to now respect their copyrights. When they apologize and pay reparations I will listen.

      • Copyright disclaimers don't hold particular weight without a pre-existing interpretation to give them weight. There isn't one in this case, and it'll end up irrelevant anyway. Re-encode the material in a jurisdiction that simply doesn't recognize anything complicated and you're done.

  • by Visarga ( 1071662 ) on Saturday October 19, 2024 @05:43AM (#64876813)
    Authors can only control the right to copying, not other rights. Training AI on their books is not something they can control. It's not copying, so they have no rights over that. When you buy a book you can wipe your ass with it, the author can't say anything.
    • The thing everybody gets wrong, as you correctly noted, is the training-the-AI-part is not copyright infringement. However, in order to train the AI, you need to have a digital copy. Unless the trainers made an authorized copy, the fact that they have a copy is copyright infringement. That's the illegal part. What they do with the copy is irrelevant.
    • Training an AI on books is copying. The software industry settled this ages ago, it's why we need a license to run a program. The act of transferring the program from your hard drive or disk to computer memory is legally considered copying it and thus you need a license to do so. The act of reading the book into an AI system for training is an act of copying. Full stop. Now, there are exceptions to copyright law which makes doing that without a license legal and the question is if training an AI counts

  • I produce content that is typically CC licensed & I've already been amending the licences to exclude uses for AI. I suspect that this may well become the norm. It'd be nice if CreativeCommons could update this exception to add to their options when choosing a licence.
    • by allo ( 1728082 )

      You should be aware, that that makes the license non-free (just as the CC-NC licenses are).

      I am also not sure if you are allowed to modify the CC license text. The licenses itself are copyrighted by their creators as well and you often not allowed to change them as this would be misleading if you still say it's CC licensed when your new clause takes away freedoms the CC license would grant.

      • It perfectly legal to add clauses to a copyright notice & lots of universities already do this for their own OERs. Additionally, the owner of the works can change the licences as they see fit, although obviously this does present some issues for those wishing to re-use OERs but it's not retroactive either, just a change in new uses. However, adding a clause for the case of a new technology, e.g. AI, isn't problematic at all since it doesn't affect previous users.

        Pretty much every organisation I know
  • The clause does not change much.

    When it comes to copyright there are two options how it may work out in court:
    1) AIs can learn under copyright exceptions. No license can change that, as the exceptions mean no license is needed for AI training, so all clauses in the licenses are irrelevant.
    2) AI training is not allowed, e.g., because the copyright exceptions are found to be ineffective. Then current licenses would be enough to disallow AI training without clauses targeting AI in particular.

  • Telling an AI it cannot be trained on a book's contents is tantamount to telling an aspiring author not to read a certain book to prevent the risk of that author imitating the book's style or content in his or her own future writings. By this logic the Tolkien Estate should sue George R. R. Martin, because it is certain that the latter drew inspiration (for profit!) from the works of the former.

    • It's not the training part that matters. In order to do the training (or the reading, for humans) requires that you have an authorized copy of a work. If you're human, you obtain an authorized copy by buying a book. The folks training AI are not purchasing copies of works. That's copyright infringement. That's the illegal part. If they purchased a copy, then what they do with it afterwards is irrelevant (with the exception of making more copies).
      • I don't see where the prohibition says "but it's OK to train on the book's contents if you've purchased it legally", have I missed something?

        • That's because the publisher is trying to stretch copyright infringement (which has the force of law behind it) to cover a licensing agreement (which is arbitrary). IMHO, publishers are doing themselves a disservice by not focusing on the copyright infringement aspect.

To communicate is the beginning of understanding. -- AT&T

Working...