Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Facebook Technology

Sarah Silverman Sues Meta, OpenAI for Copyright Infringement (reuters.com) 163

Comedian Sarah Silverman and two authors have filed copyright infringement lawsuits against Meta and OpenAI for allegedly using their content without permission to train artificial intelligence language models. From a report: The proposed class action lawsuits filed by Silverman, Richard Kadrey and Christopher Golden in San Francisco federal court Friday allege Facebook parent company Meta and ChatGPT maker OpenAI used copyrighted material to train chat bots. The lawsuits underscore the legal risks developers of chat bots face when using troves of copyrighted material to create apps that deliver realistic responses to user prompts. Silverman, Kadrey and Golden allege Meta and OpenAI used their books without authorization to develop their so-called large language models, which their makers pitch as powerful tools for automating tasks by replicating human conversation. In their lawsuit against Meta, the plaintiffs allege that leaked information about the company's artificial intelligence business shows their work was used without permission.
This discussion has been archived. No new comments can be posted.

Sarah Silverman Sues Meta, OpenAI for Copyright Infringement

Comments Filter:
  • by m00sh ( 2538182 ) on Monday July 10, 2023 @09:48AM (#63673769)

    Can I also sue?

    I'm sure slashdot also got scraped. Since slashdot says I own my posts, I'm sure my works are in the dataset as well.

    • by Narcocide ( 102829 ) on Monday July 10, 2023 @09:55AM (#63673801) Homepage

      Definitely, but to make it worth the money you're gonna have to prove something of value was stolen.

      • Definitely, but to make it worth the money you're gonna have to prove something of value was stolen.

        I can see you haven't been paying any attention to the various RIAA stories on here. Copyright is very carefully designed to avoid any need to prove value. Ask the guys at pirate bay.

        • by Bahbus ( 1180627 )

          The RIAA loves to weaponize copyright law, but it's not as powerful as people like to think it is. These authors and their lawyers clearly don't even understand the basics of copyright law.

          • Sarah Silverman is not a newbie here. She was around during the RIAA/MPAA copyright debates.

            You know, as foul-mouthed as she historically has been, she's actually Sesame Street trained in performing. There was a time Comedy Central was stand-up based instead of movie based and she was there. She's also a part of Crank Yankers, which was definitely a Sesame Workshop project by its use of puppets.

            Seems like we need to bring back a few of these copyright-aware performers, such as the Analog Hole group because

            • by sjames ( 1099 )

              But what if the AI doesn't steal your joke, it just parses it, makes minute changes to a matrix of numbers (or perhaps not even that) and moves on. It's really no different than a person hearing your joke.

              Unless the AI spits the exact joke out again, there isn't even the vaguest hint of a copyright violation.

          • The RIAA

            These authors and their lawyers clearly don't even understand the basics of copyright law.

            They don't need to. They wrote it and will have it altered if necessary.

            Further, the OpenAI groups and others like it have really made a basic blunder here with their laissez faire scraping. I'd imagine they will have a very difficult time in court. With increasing scrutiny from the bench as more and more groups sue them over copyright infringement. (To say nothing about the online services losing profits due to the scraping as well....)

            • by Bahbus ( 1180627 )

              Further, the OpenAI groups and others like it have really made a basic blunder here with their laissez faire scraping.

              This has nothing to do with copyright law. Whether anyone agrees with HOW they got their data or not, none of it was obtained by violating copyrights. They are suing on the theory that GPT could reproduce copyrighted material because it was trained on it. That isn't how LLMs work, at all. Nor do we file lawsuits on theoreticals that have never happened (unless you are extremely stupid).

              Now I will admit that perhaps the lawyers do know better but they do not care because they get paid regardless of whether t

        • Definitely, but to make it worth the money you're gonna have to prove something of value was stolen.

          I can see you haven't been paying any attention to the various RIAA stories on here. Copyright is very carefully designed to avoid any need to prove value. Ask the guys at pirate bay.

          If people use PirateBay to download free media, pay for an internet connection needed to do that and then pay for a device with which to spend a portion of their lifespan that they will never get back to consuming that media, that media has value. Same for data that OpenAI scrapes from the net, same for the news summaries Google scrapes off of news sites and then pockets advertising bucks while showing that to their users knowing full well that most of them will not bother to click through to the content cr

          • by suutar ( 1860506 )

            While you make a good argument for showing that the work has value, that doesn't change the fact that copyright law is structured to not need to prove that in court. Statutory damages are used more often, because the statutory values are generally higher than the highest reasonable value they could put on a given work.

            • If people use PirateBay to download free media, pay for an internet connection needed to do that and then pay for a device with which to spend a portion of their lifespan that they will never get back to consuming that media, that media has value. Same for data that OpenAI scrapes from the net, same for the news summaries Google scrapes off of news sites and then pockets advertising bucks while showing that to their users knowing full well that most of them will not bother to click through to the content creator. Nobody would bother to do any of this if this content didn't have value.

              While you make a good argument for showing that the work has value, that doesn't change the fact that copyright law is structured to not need to prove that in court. Statutory damages are used more often, because the statutory values are generally higher than the highest reasonable value they could put on a given work.

              The fact that you jumped through all those hoops and spent all that money to get the content and then spent a valuable and non-reclaimable portion of your lifespan consuming it isn't proof enough that you valued it? There are many things wrong with copyright law that need fixing but don't tell me that pirated/scraped content has no value to the people pirating/scraping it and if it has value to them it's not beyond the realms of reason to expect them to compensate original content creator in some way. Nobod

            • by hawk ( 1151 )

              Not my area of law these days, but in the US, statutory damages are only available for post-registration damages.

              So most published stuff, yes, but most forum posts would be limited to statutory damages.

              hmm, now that I think of it, my dissertation was registered . . .

              hawk, esq.

          • If people use PirateBay to download free media, pay for an internet connection needed to do that and then pay for a device with which to spend a portion of their lifespan that they will never get back to consuming that media, that media has value.

            The incremental cost of the bit of internet connection used to download a song or video is tiny, or zero if you're on a flat-rate line or under your cap. Even if you take the entire cost of the connection, including installation, the computer used to access it, an

            • But it has value. Look what was spent to download and use it.

              [But their cost per post was tiny. US statutory damages would be at least $750 per post.] Take the statutory damages and run.

              Not to mention that the court wouldn't use your "what they spend to make an unauthorized copy" measure of actual value - and the ones the courts use are almost as hard to prove - which is why statutory minimum damages are part of the law.

    • by Entrope ( 68843 )

      Have you registered your copyrights with the US Copyright Office? If not, you cannot sue over them in federal courts. (You only have to register them before the lawsuit, not before the alleged infringement. https://www.natlawreview.com/a... [natlawreview.com])

      • by m00sh ( 2538182 )

        You don't have to register to copyright.

        It is automatically copyrighted upon creation.

        • You don't have to register to copyright.

          That's true. But it's also true that you have to register your copyrights in order to sue in Federal court.

    • by znrt ( 2424692 )

      Can I also sue?

      please do! the more the merrier.

      this is getting better and better, nothing to date has shown in a more hilarious way what an aberration the clusterfuck of copyright laws really is. and all it took was a chatbot! comedy gold. oh wait, somebody already said that? sue me!

    • by thegarbz ( 1787294 ) on Monday July 10, 2023 @12:51PM (#63674601)

      Of course you can, but like Sarah Silverman you will lose because no element of copyright covers the idea of limiting what a person can do with information they obtain. You can copyright availability and reproduction, but you can't copyright the idea that someone may learn something. If ChatGPT was spitting our Silverman's pros verbatim you can argue copyright, if on the other hand they paraphrased or simply replicated a style it is covered under being transformative.

      • by znrt ( 2424692 )

        Of course you can, but like Sarah Silverman you will lose because no element of copyright covers the idea of limiting what a person can do with information they obtain.

        i wouldn't be so quick in assuming courts can reasonably resolve this dilemma in a a rational way in a context where fundamental concepts have been distorted to the extreme for spurious reasons. now wait for not just his, but the incoming tsunami of complaints and even class action suits ...

        this is going to bite the whole industry in the ass unless they manage to nail openAI for it and shut it down, and good luck with that.

        gorgeous!

  • Without being changed and they're going to have a problem. But if it's encoded and some fashion and that's the definition of the derivative work. I don't know enough about the actual technology behind LLMs to say one way or another.

    That said there's so much money at stake with this technology and given a judge's tendency to side with the bigger property owner I think the owners of the LLMs are going to win.
    • by godrik ( 1287354 ) on Monday July 10, 2023 @10:41AM (#63673979)

      Well, if you look at the size of the model compared to the size of the input data, then you realize that ChatGPT is much more a fuzzy compressor than anything else. It's about 40TB of raw data to build a model of 1TB. That's essentially the ratio of zip compression.
      Playing with it, I was able to generate easily as-is copy of codes that are available online. So yeah, these lawsuits don't seem frivolous. Whether they'll win or not is a different question, but the suit is reasonnable.

  • I don't want to be sued for copyright infringement because I could summarize it.

  • by williamyf ( 227051 ) on Monday July 10, 2023 @10:02AM (#63673837)

    ... and did not want to collate the trainig material according to licenses.

    Instead of using material in the public domain, suitable creative commons licenses, or under licenses (in the case of SW) like BSD, MIT, MPL, DWTFYWT (libcaca) that are more conductive, they got greedy, and used all material available, regardless of copyright...

    Nor did they want to pay license holders (say, NYT, WaPo, etc.) to get access to their collection of material.

    Well, you harvest what you sow. And you have very deep pockets to pay the army of lawyers that will defend you against the lawsuits.

    Enjoy!

    PS: Of course, they can (and will) destroy all instances of the current AI crops, and from chatGPT 5 (and all the others like Llama) onwards, they can do right by the licenses... On vera.

    • by hjf ( 703092 ) on Monday July 10, 2023 @10:17AM (#63673903) Homepage

      I, too, cannot wait for the future where I'll be sued by a book publisher for using the knowledge learned from a book they own copyright to.

      • That is exactly what the OpenAI lawyers will argue

        Meanwhile the lawyers on the other side will argue that the model constitutes a "derivative work"

        And it will probably end up at the supreme court because this is all new territory with no clear answer.

        • Meanwhile the lawyers on the other side will argue that the model constitutes a "derivative work"[.]

          I suspect that Sarah's lawyers will argue that the OpenAI model could not have been trained on her works without it copying her works, and that is the crux of the copyright violation.

          And it will probably end up at the supreme court because this is all new territory with no clear answer.

          I disagree. This has been covered by copyright law time and time again, and I think the results have been pretty consistent across jurisdictions. There is nothing new or novel in this case. OpenAI copied and used copyrighted works to make and improve its commercial product for commercial gain, which is a prima-facia copyright v

          • OpenAI copied and used copyrighted works to make and improve its commercial product for commercial gain, which is a prima-facia copyright violation.

            Google also copies and uses copyrighted works when its web browser shows copyrighted material to users. Google copies and uses copyrighted works when its search engine makes local copies of copyrighted works to inform the datasets that power Google's revenue-generating search engine. Google copied and used copyrighted books when it made book excerpts searchabl

      • by ArchieBunker ( 132337 ) on Monday July 10, 2023 @11:16AM (#63674115)

        You could try reading the points of the lawsuit. https://storage.courtlistener.... [courtlistener.com]

        Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’
        copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works.

        • What is your point?

          It's actually vastly far more likely, just from how these models are trained and how they operate, that the books in question were never even used, and random reviews from the internet were scraped. Then the model learns the general content of the reviews and spits out something similar to all of the reviews, but not the same as any of them. Just as if you read 5-10 reviews of the book and were asked to generalize what it was about...

        • Artists are trained on prior copyrighted works. Musicians are trained on prior copyrighted works. Authors are trained on prior copyrighted works.

          ChatGPT is trained on prior copyrighted works. And yet ChatGPT (and not artists nor musicians) is infringing?

          ChatGPT generates summaries of Plaintiffs' copyrighted works -- something only possible if ChatGPT was trained on Plaintiffs' copyrighted works.

          A reviewer can generate a summary of Plaintiff's copyrighted works, only if the reviewer was trained on Plaintiff's copyrighted works.

          From the suit:

          57. Because the OpenAI Language Models cannot function without the expressive information extracted from Pl

      • But that happens all the time already for derivative works. A numerically optimized (aka "trained") semi-arbitrary algorithm (aka "AI") is a derivative work of its training data. You are not. You may or may not produce a derivative work when using information out of a book, it depends on the output just like it always has. The difference here is that an optimized algorithm is a work unto itself because it is a representation of its training data. (You see how most of the problem exists because machine
        • by hjf ( 703092 )

          What's clear to me is that you use the word "training" and "parameters" without knowing what they mean.

    • by Holi ( 250190 ) on Monday July 10, 2023 @10:18AM (#63673907)

      Considering how copyright has been so thoroughly abused by congress and corporations to the point it makes a mockery of its stated goal "To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries", I have little concern about these millionaires and their tears.

      I don't see how you can read that clause and consider the authors life plus 70 as securing a "limited time" to authors. It seems to fly in the face of the wording of the constitution. Let's remember when the founders wrote it copyright had a maximum length of 28 years.

      • Considering how copyright has been so thoroughly abused by congress and corporations to the point it makes a mockery of its stated goal "To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries", I have little concern about these millionaires and their tears.

        I don't see how you can read that clause and consider the authors life plus 70 as securing a "limited time" to authors. It seems to fly in the face of the wording of the constitution. Let's remember when the founders wrote it copyright had a maximum length of 28 years.

        More to the point.

        To promote the progress of science and useful arts kinda suggests that the US constitution kinda demands that it be possible to train LLMs with relatively few copyright restrictions.

        Of course, what the US constitution says and what US courts say can be very different things.

    • ... and did not want to collate the trainig material according to licenses.

      And why would they? What concept in copyright law restricts your ability to learn from what you see? To be clear the case here isn't about how they acquired the material, or replication of it, the case here is based on the idea of training the algorithm and it producing something that resembles the style of another artist. That is not a copyrightable concept.

      • What concept in copyright law restricts your ability to learn from what you see?

        Nothing in copyright law restricts your ability to learn from what you see. It does, however, restrict your ability to copy copyrighted works without permission. LLM's do not see and learn. They copy, analyze, collate, aggregate, statistically predict, and reproduce. They are, at their core, copying engines. It's their primary function.

  • If I go to a website and read it, which does involve a "copy" to put it on my screen, but I learn it - is that copyright infringement, or just learning?

    I'm pretty sure that reading something and learning it isn't copyright infringement. I'm fairly certain LLMs aren't "copying" the raw material any more than a person memorizing a favorite quote is.

    Interesting times, to see how this gets worked out in our legislative and social systems.

    • by hjf ( 703092 ) on Monday July 10, 2023 @10:20AM (#63673915) Homepage

      This is exactly the point. And slashdot's bipolarity about this issue is amazing.

      You have an overlap in "AI haters" and "Copyright haters". They see AI as the greater threat, so they (think) they will ally with copyright holders to destroy AI, and then go back to hating copyright holders.

      The reality is that copyright holders will only get stronger if they win, AND, they will try to use this same concept to reserve their rights to demand compensation from people using their books to do their jobs.

      Saying that OpenAI shouldn't use publicly available (but copyrighted) information without compensating the authors, is like saying an engineer can't build a bridge without paying royalties to the author of the books they learned from.

      • Yeah it's pretty wild to see a big chunk of the tech worlds like here, on arstechnica etc. suddenly flip to be pro-copyright. Ah yes now copyright holder gets to control all possible future uses of their work, forever. Very cool.

        IMO as long as the models don't reproduce the original work more or less completely and highly accurately (let's say the few overtrained examples in Stable Diffusion, like the Mona Lisa) then there's no real case here. Of course I'm not the one deciding it so who knows what the cour

    • At its core, it's an attempt to ensure authors get paid for the books and articles that they publish. If it's on your screen either it's there legal or it's there illegally. If it's legally, then permission will have been given in some way. If illegally - because the content has been uploaded to a server somewhere without the permission of the author, then you're in violation of copyright

      The argument of the authors here is, presumably, that they got access to material that was not intended for general consu

  • by PPH ( 736903 ) on Monday July 10, 2023 @10:12AM (#63673879)

    In the training data set, Silverman's material was tagged as "not funny".

    • Women are fighting an uphill battle in comedy, because of the widespread belief that women just aren't funny. Unfortunately, Netflix has released such a torrent of unfunny women specials that the belief is really reinforced. Seriously, Netflix... good god. The litany of awful female standups you have presented to the world is doing real damage. Stop it. Exercise a little restraint.

      I think Sarah Silverman is funny. I've seen "A Speck of Dust" twice. It's at least as good as Patton Oswalt's most recent offeri

      • I don't know, Ali Wong: Baby Cobra was pretty damned funny, and I rarely find stand-up all that funny regardless of the gender of the person with a mic in their hand.

    • by dohzer ( 867770 )

      I don't know... that Paris Hilton roast was funny as fuck.

  • ...this is basically biting the hand that feeds you and is a good way to get yourself removed from key services that are used to promote your identity and business.
    • Re: (Score:2, Insightful)

      by HBI ( 10338492 )

      I don't necessarily agree about infringement but you do have a good point. This lawsuit is a tacit acknowledgement that their careers are over.

  • The article's headline says they're suing over copyright infringement. But then by the middle of the article..

    In their lawsuit against Meta, the plaintiffs allege that leaked information about the company’s artificial intelligence business shows their work was used without permission.

    .. the question of whether or not it was copied seems to have been abandoned, and they're actually suing over how the data was used.

    Anyone know for how many years an artist of a creative work, is the sole person who use a

    • You can learn from it, you just can't make a digital representation of it for yourself unless you pay for it. The AI learning is still a digital representation.
  • Silverman, Kadrey and Golden allege Meta and OpenAI used their books without authorization to develop their so-called large language models

    Well, if they purchased a copy of the book, then how did they infringe on the copyright? If I buy a book, I can 'use' the book as much as I want without further 'authorization'. Even funnier in TFA:

    “retains knowledge of particular works in the training dataset," the lawsuit says.

    Um, am I not allowed to 'retain knowledge' of the shit I bought and paid for? I don't even understand where the infringement is...

  • These people should be honored that their content was used to train advance intelligence systems.

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.

Working...