Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
AI Technology

Penguin Random House Underscores Copyright Protection in AI Rebuff (thebookseller.com) 18

The world's biggest trade publisher has changed the wording on its copyright pages to help protect authors' intellectual property from being used to train large language models and other artificial intelligence tools, The Bookseller has reported. From the report: Penguin Random House has amended its copyright wording across all imprints globally, confirming it will appear "in imprint pages across our markets." The new wording states: "No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems," and will be included in all new titles and any backlist titles that are reprinted.

The statement also "expressly reserves [the titles] from the text and data mining exception," in accordance with a European Parliament directive. The move specifically to ban the use of its titles by AI firms for the development of chatbots and other digital tools comes amid a slew of copyright infringement cases in the US and reports that large tranches of pirated books have already been used by tech companies to train AI tools. In 2024, several academic publishers including Taylor & Francis, Wiley and Sage have announced partnerships to license content to AI firms.

Penguin Random House Underscores Copyright Protection in AI Rebuff

Comments Filter:
  • by Bahbus ( 1180627 ) on Saturday October 19, 2024 @01:23AM (#64876515) Homepage

    I doubt this will stop companies (or individuals) from doing it, though. They'll just do a better job at hiding it and making it harder to prove ever happened in the first place.

    • by Z00L00K ( 682162 )

      Add to it that there are loopholes in the copyright law protecting parodies.

    • and making it harder to prove ever happened in the first place

      its easy to prove when your "AI" models training data contains all the other companies watermarks in the requested test sample.

  • The corpus of public domain books, freely available for AI training, much better written, would be good starting point for training AI models on literature.

    Even the penny dreadfulls, dime novels, pulp magazines are better written than much of the modern books.

    Reality: We should push the copyright office, Congress to
    limit copyright for written works on paper or electronic to 50 years from the earliest of

    - date of first publication - revisions, author's cut, etc. do not extend copyright on the original work
    -

    • by Anonymous Coward

      - if unpublished, then 50 years from the youngest author's 35th birthday.

      So if an 86 year old and an 87 year old write something together, they already lost copyright one year before it was written (since it was certainly unpublished when it was not yet written).

    • by Z00L00K ( 682162 )

      My take is 5 years after the death of the last passed creator.

      That would be enough to close the books for the creator and ensure they'll get a decent closure.

      That would mean that at least some of the works of Prince could be free to use.

      • Make it work like a patent with a very limited term but requiring a whole, complete implementation being deposited along with the key tools needed to reproduce it. For recorded stage plays, movies and shows, this would include not only the original source footage but also things like the script and designs or replicas of key props. For literature, it would include authors notes and early drafts, as well as references where applicable for any research materials used, and yes, for software, that would include
    • Great if you want the resulting models to generate texts that sound just like your grandparents; languages evolve quite significantly from one generation to the next. The LLMs we have now were trained on the "average" of all generations which is why they tend to produce some rather strange & jarring combinations of words & expressions.
  • by Mononymous ( 6156676 ) on Saturday October 19, 2024 @03:03AM (#64876605)

    This is a legal question. Whether or not it's allowed depends on the law, not on the message on the copyight page.
    If the law restricts AI training, saying it's not allowed doesn't change that.
    If the law doesn't restrict it, saying it's not allowed doesn't somehow change the law.

    • They should reword it to be like a shrink-wrap license. If you don't agree, then return the book, etc.
    • by evanh ( 627108 )

      I wouldn't be so dismissive.
      A copyright disclaimer holds a hell of a lot more water than something like a TOS or EULA. Copyright is something that is well tested and repeatedly upheld by the courts. And, in the US, it has the DMCA cudgel smashing everything too.

  • Authors can only control the right to copying, not other rights. Training AI on their books is not something they can control. It's not copying, so they have no rights over that. When you buy a book you can wipe your ass with it, the author can't say anything.

The best defense against logic is ignorance.

Working...