Authors Sue Anthropic For Copyright Infringement Over AI Training (reuters.com) 57
AI company Anthropic has been hit with a class-action lawsuit in California federal court by three authors who say it misused their books and hundreds of thousands of others to train its AI-powered chatbot Claude. From a report: The complaint, filed on Monday, by writers and journalists Andrea Bartz, Charles Graeber and Kirk Wallace Johnson, said that Anthropic used pirated versions of their works and others to teach Claude to respond to human prompts.
The lawsuit joins several other high-stakes complaints filed by copyright holders including visual artists, news outlets and record labels over the material used by tech companies to train their generative artificial intelligence systems. Separate groups of authors have sued OpenAI and Meta over the companies' alleged misuse of their work to train the large-language models underlying their chatbots.
The lawsuit joins several other high-stakes complaints filed by copyright holders including visual artists, news outlets and record labels over the material used by tech companies to train their generative artificial intelligence systems. Separate groups of authors have sued OpenAI and Meta over the companies' alleged misuse of their work to train the large-language models underlying their chatbots.
Comment removed (Score:4, Interesting)
Re: (Score:1)
Re: (Score:1)
We should know, by now, that copyright holders are a bunch of fuckin whiners with lawyers.
Want to bet the complainants 'books' are self-published, and didn't sell? But it's all the fault of [deep pockets]! Waaaaaaa....
Re: (Score:1)
See how that works. They want something for free but want you to pay for their shit.
Re: (Score:2)
It's not clear what you're asking. IF you're asking if there have been lots of lawsuits, the answer is yes [chatgptise...eworld.com]. None have concluded.
If you're asking is it common for AIs to be trained on copyrighted data, yes. In the same way it's common for Google to download copyrighted data to build its search engine, and a million other things (perhaps the most extreme being the Google Books case). The defendants argue that the same fair use exemption for the automated processing of copyrighted data to create transforma
Re:Wow. It's hard to tell if this is a dupe or not (Score:4, Interesting)
Re: (Score:1)
Here's a fair compromise: Anthropic can use these works for free, as long as Anthropic gives their AI gadget away for free. Problem solved.
Re: (Score:1)
It is easy to be creative without infringing on others copyrights.
Unless you consider creativity means: how to make a copy of something that was successful, make lots of money, and get away with it.
My father was a tax consultant ... he used to tell people: if you would put your mind into how to make more money, instead how to save taxes: you would have much more money.
Re: (Score:1)
"27. For both the post- and pre-training processes in developing Claude, Anthropic created multiple, unlicensed copies of the training data."
That allegation was not present in many of the prior lawsuits and, I believe, is a very important allegation to make. Even if "reading" the books, "talking about" the books, making the books "searchable" etc is all fair use via application of pr
Re: (Score:3)
This is my primary objection as well. I have zero issues with $company going to a bookstore and buying or going to a library and checking out 50,000 books, scanning them, OCRing them, and using them to create a search index for books or training an AI on it. I don't have an issue if they buy 50,000 ebooks either. Imho, those would be reasonable fair use of a legally acquired works. (e.g. google book scanning). Where it crosses the line is when $company builds a billion-dollar product on "50,000 eBooks Cl
Re: Wow. It's hard to tell if this is a dupe or no (Score:2)
Re: (Score:2)
Understood and agreed. Verbatim copies of a copyright work are not ok. AIs (and people) should not do that when reading/learning/digesting publicly posted content. That said, learning is a fundamentally different process from copying, and I don't see an issue with non-copying learning from publicly displayed works. I.e. It's not ok for me to go to the MOMA, duplicate (non-public domain) $ART, and sell prints of it. My understanding is that there is no restriction on making "inspired by" copies though,
Re: (Score:1)
Publicly distributed does not imply "no copyright"
Everything I put on the internet, is copyrighted by me! Without any special notice.
That is common sense. And: law!
Re: (Score:2)
I understand this, and counter that posting this content here, publicly, gives the expectation that others can read, consider, learn from, and form opinions on this content the same as if you'd shouted it in the public square.
This brings some questions to the fore:
Assume I have an eidetic memory, do I not have license to remember your content?
If I'm not permitted to remember your content, do I have license to consider it in forming my own opinions?
Does a machine, without an eidetic memory, have fair use arg
Re: (Score:1)
No, you do not need a license to memorize it. Or to use your memory.
ONE
The questions arise when you make a "copy" and if something you consider your genuine own work is proclaimed by others a copyright violation.
If you scan a paper book of mine, that I intentionally did not publish as a 1$ ebook, and you put it up for free, or to make money on ebook stores: then you violate my copyright. I point out: it does not matter if it is for free or for 1$.
Lets look at ONE again
There is a kind of famous law case betw
Re: (Score:2)
[I]s it common for AIs to be trained on copyrighted data, yes. In the same way it's common for Google to download copyrighted data to build its search engine, and a million other things (perhaps the most extreme being the Google Books case). The defendants argue that the same fair use exemption for the automated processing of copyrighted data to create transformative goods and services applies to them. Plaintiffs various allege that it doesn't, that the outputs infringe, or various other claims. Most claims haven't been going very well for plaintiffs pretrial, but of the claims that make it to trial, it's too early to say how those will go.
Very well summed up.
It will be interesting to see how all of this shakes out in court. It will take many years, many lawsuits, and many appeals before we have anything resembling a real answer.
Re: (Score:2)
The AI champions have made an argument that all they are doing is what people do normally. I do not need to pay for a book if I go to the library and check it out, but that book may inspire me to write my own book. For my new book do I need to pay a license fee back to the artist who inspired me? No. So by that very nature, information is free when it inspires others to create. All th
Re: (Score:1)
Re: (Score:1)
because they are not mixing their own human ingenuity with inspiration from other humans to create something original
What do you think a prompt is? Do you think AI just generates random photos on it's own? No, it's mixed with human ingenuity with inspiration from other humans to create something original. Prompt engineering is real. I have friends that are very good at it and can get almost exactly what they want out of the image generator. Prompts can be simple, or extremely complex.
Also, LLMs are modeled after the human brain and how neurons work, so how can you refute that it's any different than how a human brain s
Re: (Score:2)
Re: (Score:1)
I was trained using many books and websites so I guess I can be sued too.
Well yeah but you were monetized by those websites by gleefully handing over your eyeballs + tracking data. How is the website, or YouTube or whatever gonna get paid by the AI Borg?
What I just wrote doesn't apply to free, open-source code and documentation.
Re: (Score:2)
Gaining knowledge outside scope of copyright. (Score:5, Insightful)
Gaining knowledge by reading is outside the scope of copyright.
Now people training an AI, if they make local copies for training purposes those copies may be a violation. If Anthropic bought a kindle version of a book, downloaded it to their computer like so many of us do, can they be used to train an AI? Maybe.
Re: (Score:2)
GPL is about distribution, copyright is about reproduction.
You can argue about temporary copies (though you would probably be wrong when doing so) but these training sets have nice permanent copies stored on their storage. They were reproduced without a license.
Re: (Score:2)
GPL is about distribution, copyright is about reproduction.
Yeah, I misspoke. I was thinking about making copies. "Copy" as in "Copyright".
You can argue about temporary copies (though you would probably be wrong when doing so) ...
If by temporary you mean reading text off a web page, some sort of temporary local copy is involved, then yes that would be acceptable. Much like copyrighted code in ROM may need to be copied to RAM in order to execute. But that sort of thing is not what I am referring to. I am referring to making a local copy of that website for training my AI. The duration being however long to iterate through all my training attempts. Its not
Re: (Score:2, Insightful)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
No, you don't understand, humans read a little, LLMs read everything, but humans have search engines to remember/find out anything, so... uh... it's the same shit.
What if verdict orders training removed? (Score:4, Interesting)
Judge orders AI to be untrained or wiped of all learned offending material.
Re: (Score:3)
Re: (Score:2)
That would be the least of their worries. Absent new laws there are only two options, fair use or bankruptcy.
If the Supreme Court rules copying content into the training set is not fair use it will become the biggest legal mess in history. Companies working on LLM's have an additional problem, because they didn't just copy content off the legal internet, they copied it from pirate sites too (books1/2, shadow libraries etc). So they don't just need fair use, they need an exemption on copyright law just for t
This needs a legislated solution (Score:3)
Ideally we need a mechanism for the AI models to pay the authors for the content they learned on but this is likely impossible. Failing that perhaps an agreement that society (i.e. communism) owns say 75% of all AI models.
I don't have a solution but I know that the courts don't have a mandate to find one. This is a problem for our legislators to work out.
Also an open source AI that everyone can own and use does not cause this problem to go away. Some entities are in much better positions to benefit from an AI than others.
Re: (Score:2)
The AI companies didn't even buy the books ... they pirated everything.
Re: (Score:2)
Common knowledge, but when push comes to shove you'll be able to find someone to testify.
Re: (Score:2)
I learn by going out and interacting with my environment but I've also learned by reading and watching what others have created. The authors of my textbooks and even of fiction expect me to use what I read to create new things and what I create is considered fair use. The authors of the textbooks and other creative works did not expect an AI to be able to remember (make exact copies) of their work and use that to create new work. In some cases they didn't consent to a computer even reading their works. The copyright law that the USA has pushed on most of the western world is so draconian that the AI companies will likely lose with the mandatory fines in the trillions. This probably isn't ideal. Also buying a single copy of every creative work isn't enough. For example a textbook can only be physically read by a few people and they can only create so much. An AI can create multiple instances of itself. Ideally we need a mechanism for the AI models to pay the authors for the content they learned on but this is likely impossible. Failing that perhaps an agreement that society (i.e. communism) owns say 75% of all AI models. I don't have a solution but I know that the courts don't have a mandate to find one. This is a problem for our legislators to work out. Also an open source AI that everyone can own and use does not cause this problem to go away. Some entities are in much better positions to benefit from an AI than others.
Other parts of the world will start moving forward while the capitalism obsessed worry over value lost. Not that I think it needs to be a free-for-all for the LLM companies, but if we expect legislation to make this work, well, I'd ask you to witness the absolutely train wreck we have in the US government. Nothing positive is coming out of that hellhole. Certainly not anything positive when it comes to tech. They'll either gridlock their way into some stupidity, or just flat out believe whoever has the most
Re: (Score:1)
Umm, yes they do. Just because you are dazzled by the words "AI" does not mean the law does not apply to businesses that breaks them. Anthropic did not pay for the material they used to train their AI gadget. That's breaking the law. Had they paid the authors, or had the Authors gave Anthropic permission to use their works, Anthropic would not have broken the law. But Anthropic wanted free stuff. Ironically, Anthro
Authors earn shit (Score:2)
AI: Stealing 10-Cents from Everyone (Score:3)
This is just the rich grab'n 4 more (Score:2)
It should be interesting proving the presumption (Score:2)
They PRESUME their materiel was used for training. PROVE it.
And that because it was presumed to be used, pirated (they keep records of who bought their books?!)
Remember innocent until proven guilty? The burden of proof is on the accuser.
I got my popcorn.
Re: (Score:1)
Re: (Score:2)
The burden of proof is on the accuser. They have to prove their presumption.
Idiot
Suppose you invented a positronic brain (Score:2)
So nerds, lets get back to basics.
Suppose you invented a AI like Commander Data, HAL-9000, Jarvis, Johnny-5, KITT, or whatever. Now it sits on your desk and knows nothing. What would you do to train it? Honestly, I think I would feed it every book, Encylopedia, academic paper, TV show transcript, yomama joke, and line of code I could get my hands on. Who wouldn't?
And that's fine if you control it and it sits on your desk and you own those things above. Now you setup a web server and anyone in the world
Who? (Score:2)
Maybe these authors who most people have never heard of should embrace it because it might elevate their own work out of obscurity.