Judge Dismisses Lawsuit Over GitHub Copilot AI Coding Assistant (infoworld.com) 83
A US District Court judge in San Francisco has largely dismissed a class-action lawsuit against GitHub, Microsoft, and OpenAI, which challenged the legality of using code samples to train GitHub Copilot. The judge ruled that the plaintiffs failed to establish a claim for restitution or unjust enrichment but allowed the claim for breach of open-source license violations to proceed. InfoWorld reports: The lawsuit, first filed in Nov. 2022, claimed that GitHub's training of the Copilot AI on public GitHub code repositories violated the rights of the "vast number of creators" who posted code under open-source licenses on GitHub. The complaint (PDF) alleged that "Copilot ignores, violates, and removes the Licenses offered by thousands -- possibly millions -- of software developers, thereby accomplishing software piracy on an unprecedented scale." [...]
In a decision first announced on June 24, but only unsealed and made public on July 5, California Northern District judge Jon S. Tigar wrote that "In sum, plaintiff's claims do not support the remedy they seek. Plaintiffs have failed to establish, as a matter of law, that restitution for any unjust enrichment is available as a measure of plaintiffs' damages for their breach of contract claims." Judge Tigar went on to state that "court dismisses plaintiffs' section 1202(b) claim, this time with prejudice. The Court declines to dismiss plaintiffs' claim for breach of contract of open-source license violations against all defendants. Finally, the court dismisses plaintiffs' request for monetary relief in the form of unjust enrichment, as well as plaintiffs' request for punitive damages."
In a decision first announced on June 24, but only unsealed and made public on July 5, California Northern District judge Jon S. Tigar wrote that "In sum, plaintiff's claims do not support the remedy they seek. Plaintiffs have failed to establish, as a matter of law, that restitution for any unjust enrichment is available as a measure of plaintiffs' damages for their breach of contract claims." Judge Tigar went on to state that "court dismisses plaintiffs' section 1202(b) claim, this time with prejudice. The Court declines to dismiss plaintiffs' claim for breach of contract of open-source license violations against all defendants. Finally, the court dismisses plaintiffs' request for monetary relief in the form of unjust enrichment, as well as plaintiffs' request for punitive damages."
"You can't prove your project influenced Copilot" (Score:3)
Grab them by the copyright (Score:5, Funny)
Re: (Score:2)
When you're a very large company, they let you do it. You can do anything
FTFY, all you need to do is bribe^H^H^H^H^H lobby congress and you are good to go.
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
In the same way, you cannot prove that your single vote influences the outcome of an election. Is that a reason to abolish voting altogether?
Yes, you can prove your single vote influences the outcome of an election. Several cases in fact [npr.org]. This one [npr.org] as well.
Re: (Score:2)
Re: (Score:2)
Voting isn't an individual activity, it's a group activity.
Re: (Score:2)
The standard for copyright is not "did you homeopathically influence the product?".
Another AI lawsuit bites the dust, despite the relentless cheering of "this will be an open and shut case against AI!" from the Luddites, who seem to think that copyright grants you some sort of a dictatorship to control anything ever created by anyone who so much as glances at your work.
This was one of the strongest cases against AI, as you have a large model, a proportionally small training dataset, lots of dataset duplicat
Re: (Score:3)
Another AI lawsuit bites the dust, despite the relentless cheering of "this will be an open and shut case against AI!" from the Luddites, who seem to think that copyright grants you some sort of a dictatorship to control anything ever created by anyone who so much as glances at your work.
This is pretty much the argument Disney and the other copyright extenders have used over the decades. So it's not like this came up out of nowhere. It's funny how it only applies legally when it's a massive behemoth company claiming copyright infringement. All the little guys pretty much get told their copyright means nothing at all, because someone else has made more profit from it.
Behold the true winner in all this: Greed, or new God. Profit above all!
Re: (Score:3)
it only applies legally when it's a massive behemoth company claiming copyright infringement. All the little guys pretty much get told their copyright means nothing at all
copyright was always about big money, little guys never were anything but useful idiots. some of them get little crumbs for the service now and then, most not even that, they can even get ripped off by big money anytime.
Re: (Score:2)
America - the best justice money can buy!
Re: (Score:2)
Re: (Score:2)
Try visiting HuggingFace at some point. TONS of "little guys" train coding LLMs.
Re: (Score:3)
It's not enough to show that it's theoretically possible for a user to hack the model into replicating something by force-feeding it enough of the original (and hoping that it just happened to be heavily overtrained on that specific original so that it's even capable of doing so - the larger the dataset vs. model size, the less likely that becomes). Or even sneakier exploits, like glitch tokens or whatnot. It's not Adobe's fault if you use Photoshop to draw Donald Duck; it's your fault, and you don't even have to spend hours trying to find ways to sneakily manipulate Photoshop and rely on a lot of luck to be able to do so. If you hack a Google server that's hosting copyrighted Disney data and copy all the data off it, the copyvio isn't Google's fault. They were trying to stop you from doing so. It's not the normal usage of the product. In the normal use case of these models, they don't replicate works, and the copyright system is based on the replication of specific works (There's also character copyright, which is sort of a special case around well-delineated characters, but even that is considered to still stem from works). They're not compositors. They don't collage works together.
That's quite a lot of conclusion based on such a narrow judgement.
Re: (Score:2)
My social media accounts have the hashtag "#CryptoShouldDrownInADitch" in the profile, thank you very much.
I once polled the Stable Diffusion Reddit on a number of things. One was crypto. The ratio was ~1 in 5 liked it, ~1 in 5 were neutral, and ~3 in 5 hated it, half of those vehemently.
20 of the 22 claims have already been dismissed before the trial even started, including the vast majority of potential damages.
The fact that you don't understand the law around fair use and the limitations of copyright i
Re: (Score:2)
Did google have the right to make and retain that that copy of the image in the first place? Did they have a right to profit off its use? If they weren't profiting off of it, what was it doing on a server being used for commercial purposes alongside millions of other copyrighted image
This is correct (Score:3, Interesting)
No human is born with the ability to speak, code, do sports or pretty much anything. We are educated throughout our life to pick up skill - you start using words, snippets and sentences your parents used. You may even move on to quoting Shakespeare. Using the same words as someone else has does not mean you owe them anything.
If I read a particular piece of code and type it out, line by line in a different document, am I reusing your code? How will you prove that I even read your code?
The same goes for AI - it "reads" code, and establishes correlations and patterns. The new thing here is the correlations and patterns it has established. It then will use those same words/code in a completely different instance. That's not very different from what everyone is doing all the time. There is no liability here.
Re:This is correct (Score:4, Insightful)
If I read a particular piece of code and type it out, line by line in a different document, am I reusing your code?
Yes, you are.
How will you prove that I even read your code?
But comparing the two pieces of code side by side and showing that they're identical.
Re: (Score:3)
Re: This is correct (Score:2)
What about all the cases where there's only one rational way to write a particular line? Plenty of people have written identical lines of code independently because they were trying to accomplish the same thing and they all wanted their code to work.
Re: (Score:2)
Re: (Score:2)
Honestly, that's not true. The similiarity between many frequently-implemented algos (such as say automated growing of buffers in C, which has probably been implemented tens if not hundreds of millions of times) guarantees that plenty will be exact copies.
Also, copyright is based on a standard of creative endeavour, not "sweat of the brow". Rote or noncreative work cannot be copyrighted (among many other things).
Re: (Score:2)
That's why I always change the variable names in "copy pasta" code :)~
Re: (Score:2)
But comparing the two pieces of code side by side and showing that they're identical.
that might be irrelevant.
oracle tried quite hard to sue google for copying this code literally:
private static void rangeCheck(int arrayLen, int fromIndex, int toIndex {
if (fromIndex > toIndex)
throw new IllegalArgumentException("fromIndex(" + fromIndex + ") > toIndex(" + toIndex+")");
if (fromIndex arrayLen)
throw new ArrayIndexOutOfBoundsException(toIndex);
}
to no avail. google even admitted that it was a direct copy. still, in this case common sense prevailed, since the judge correctly understood that what google actually copied was the api (which aren't copyrightable as anyone is entitled to provide alternative implementations) and that the code leftover was an oversight besides being the bloody obvious implementation for such a function. "mine mine mine" bullshit d
Re: (Score:2)
Re: This is correct (Score:2)
My copyrighted code includes
for(int i=0:ij;++i) {
My attorneys are preparing the lawsuits now
Re: This is correct (Score:2)
Thank auto correct for screwing up my loop statement. :(
Re: (Score:2)
Re: (Score:2)
You're neglecting a very important part of copyright infringement - the copy part. You would still need to show that someone copied your code and didn't independently write it. Copyright is not the same as a patent.
Nope. Not in the USA. You only have to prove that it is identical and subject to copyright. Not that they had access to the original. However, a clean room implementation would have a strong argument that there was a lack of creative element in the copyrighted work and there for the work isn't copyrightable.
Re: (Score:2)
Nope. Not in the USA. You only have to prove that it is identical and subject to copyright. Not that they had access to the original.
Do you have a citation for this? I've never heard that before, and I don't see anything in the law that would indicate that this is true.
However, a clean room implementation would have a strong argument that there was a lack of creative element in the copyrighted work and there for the work isn't copyrightable.
No, you do a clean room implementation so that the people writing the new implementation couldn't have possibly copied any part of the original implementation, which would make copyright infringement impossible. And it's completely absurd to argue that because two different (groups of) programmers wrote different implementations that neither implementation has copyright pr
Re: (Score:2)
Memorizing a copyrighted work and regurgitating it is a violation. It doesn't need to be a literal copy-paste copy. It also doesn't matter if you claim you came up with a word for word identical work independently if the other was published first. e.g. If I happen to write a song called "Shake it Off" that is word for word identical to the one by Ms. T. Swift, I wouldn't get very far claiming it's a distinct original work, even if it was true.
But I doubt just a for loop in isolation is considered a creati
Re: (Score:2)
Memorizing a copyrighted work and regurgitating it is a violation. It doesn't need to be a literal copy-paste copy.
Yes, this is true.
It also doesn't matter if you claim you came up with a word for word identical work independently if the other was published first. e.g. If I happen to write a song called "Shake it Off" that is word for word identical to the one by Ms. T. Swift, I wouldn't get very far claiming it's a distinct original work, even if it was true.
You wouldn't get very far with that claim because the probability of two people independently writing identical lyrics are so close to zero that no jury would ever believe you.
Re: (Score:2)
Right so why is not the same thing true for code? You said You would still need to show that someone copied your code and didn't independently write it.. I think that for business logic, not rote boilerplate for standing up a framework, "the probability of two people independently writing identical code are so close to zero that no
Re: (Score:2)
Re: (Score:3)
The same goes for AI - it "reads" code, and establishes correlations and patterns. The new thing here is the correlations and patterns it has established. It then will use those same words/code in a completely different instance. That's not very different from what everyone is doing all the time. There is no liability here.
Bullshit. Electronic storage and processing of data is nowhere legally regarded the same as a human looking at it. While you may be too stupid to see it, the law recognizes that humans and machines are different.
Re: (Score:2)
"Electronic storage and processing of data" is widely considered fair use, which is why, among countless other things, developing a search engine isn't illegal.
Re: (Score:2)
Actually, no. There are specific exceptions for web-browsers and search engines are only legal because it is assumed that by publishing you give permission to index. There are no general permissions for anybody to store or process anything they got from the web.
Re: (Score:2)
Try suing a search engine for ignoring your robots.txt and let me know how well that goes for you. *eyeroll*. Yes, you can scrape websites. [natlawreview.com]
It's not just search engines, it's "all Big Data". One of the most extreme examples was Authors Guild, Inc v. Google Inc. Here Google was mass-scanning copyrighted books, against explicit demands from the
Re: (Score:2)
You are confused. LLM training is in no way "fair use".
Re: (Score:2)
Re: (Score:2)
Nope. Are you mentally challenged?
Re: (Score:2)
Re: (Score:2)
The same goes for AI - it "reads" code, and establishes correlations and patterns. The new thing here is the correlations and patterns it has established. It then will use those same words/code in a completely different instance. That's not very different from what everyone is doing all the time. There is no liability here.
Bullshit. Electronic storage and processing of data is nowhere legally regarded the same as a human looking at it. While you may be too stupid to see it, the law recognizes that humans and machines are different.
Companies have been scraping the web since it was invented, feeding that into algorithms, and producing analytical products. With GPTs, the scraped data is probably even more diffused and ground up in the model. And this has always been perfectly legal.
I don't like it either.
If there had been licenses on the allowed use of the materials that said specifically they could only be used for humans to directly read, and not allowed for the training of algorithms, there would be a leg to stand on. What has happen
Re: (Score:2)
I think the question is "How large does an identical chunk need to be before it's infringing?". And I don't think there's a reasonable answer.
Re: (Score:2)
I think the question is "How large does an identical chunk need to be before it's infringing?". And I don't think there's a reasonable answer.
That is ALWAYS the question of "Fair Use".
There is no specific amount, large or tiny, that determines whether it's a Fair Use.
There are other factors in the consideration.
That's why every Fair Use has to be adjudicated.
In the case of the GPT, there is usually no exact quotation in the model that can be attributed.
The amount of copied material is legally ZERO,
from the standpoint of Copyright.
Sometimes we (society) thinks this is all well and good, and sometimes we don't like it. Different countries have diff
Re: (Score:2)
Except derivative works are a thing too and can run afoul of copyright. I don't think anyone has litigated whether the model or its output is a derivative work of the training material yet.
Re: (Score:3)
That's not very different from what everyone is doing all the time. There is no liability here.
I think there's a key point missing here. The amount of information a human can process is naturally constrained, whereas a machine is not. So while superficially, what these machines are doing is roughly the same as a human brain, that doesn't mean that our laws ever envisioned use of this nature and scale.
We are just dealing with something unprecedented here, and everyone is learning how to deal with it. I wouldn't dismiss these concerns off hand.
Re: (Score:2)
That's not very different from what everyone is doing all the time. There is no liability here.
I think there's a key point missing here. The amount of information a human can process is naturally constrained, whereas a machine is not. So while superficially, what these machines are doing is roughly the same as a human brain, that doesn't mean that our laws ever envisioned use of this nature and scale. We are just dealing with something unprecedented here, and everyone is learning how to deal with it. I wouldn't dismiss these concerns off hand.
You have hit the nail on the head: nobody quite saw this coming (even though it's not even that different from what's been going on for many years).
As for the models already trained, and maybe even the continuing and future revision of those models, the horse is already out the burning barn. You lose.
When there are new laws about this, ten years from now, will they apply to GPT 211.13 which was trained between 2017-2034? Maybe if you can create a GPT before the new laws come into effect, you can claim every
Re: (Score:2)
Re: (Score:2)
Used to be so before 1995, but now you got search engines to connect you to an unbounded amount of information. You can't read everything, but you can find anything in a huge amount of web text.
Re: (Score:2)
What if AI is writing songs? (Score:1)
Would the decision be different if the AI was writing songs with complete lines picked from one song but few other lines from some other song ?
Re:What if AI is writing songs? (Score:5, Informative)
Re: (Score:3)
Re:What if AI is writing songs? (Score:5, Informative)
Oh God, the "AI music lawsuit" (UMG Recordings, Inc v. Uncharted Labs, Inc) is terrible. The recording industry is arguing exactly opposite what they were arguing like six years ago with the Blurred Lines lawsuit**. I can't wait for the defense response, because they're surely going to just massively quote the plaintiffs in their defense ;)
Basically, Marvin Gaye's family sued Robin Thicke and Pharrell Williams over copyright infringement over the song "blurred lines" for sharing some musical similarities. The recording industry realized that having a lax standard on what musical similarities count as infringement would be disastrous for them, as huge amounts of their catalogs could be considered infringing. So they filed an amicus brief [techdirt.com] arguing:
and:
and:
But then they base UMG Recordings, Inc v. Uncharted Labs, Inc on exactly what they said should not be the standard, taking things that aren't "virutally identical copying", sequences of three notes, motifs, standard rhythmic passages, arpeggios, chromatic scales, and the like, and declaring that to be copyright infringement.
But it's even worse - and I can't wait to see the defense response and the judge's reaction to it - because on top of that they're playing a game of "Million Monkeys On A Million Typewriters" without telling the court. E.g. not only deliberately trying to get the model to create a copyrighted work by "leading it on" (a concept of which the judge in this Github case just smacked down), but doing so over and over and over again until they can get a few seconds of similarity - but not mentioning that they did this in the filing.
Here's an example - they cite this 4-second clip [youtube.com] claiming that when
Re: (Score:2)
Whoops, sorry, that was the Stairway to Heaven case, not the Blurred Lines case.
Re: (Score:2)
Whoops, sorry, that was the Stairway to Heaven case, not the Blurred Lines case.
Yeah, because Marvin Gaye Estate *PREVAILED* in Blurred Lines. Which I thought was a bullshit outcome.
Re: (Score:2)
So did most legal analysts, from what I've seen.
Re: (Score:3)
Actually, I forgot my favourite part of the case:
Re: (Score:2)
Would the decision be different if the AI was writing songs with complete lines picked from one song but few other lines from some other song ?
We will see, with the image/video GPTs.
The code-based ones, too hard to tell; the usual problem with copied code, so, No.
The text-based ones don't usually have sufficient attributable output, so, No.
"allowed the [license claim] to proceed" (Score:4, Informative)
So in fact didn't dismiss the lawsuit, just some aspects of it.
Re: (Score:2)
Indeed. And the license claims could be disastrous. Usually, you can either force them to respect the license (which they probably cannot do) or to stop using the work (which means retraining the whole model).
Re: (Score:2)
This wasn't a ruling on the merits of the claim - it was a ruling on whether it's even possible to make such a claim. 20 of the 22 claims have been thrown before the trial even begins.
And "they probably cannot" put in a filter for the specific works of the plaintiffs in this case? They can't stop using the work in training? Really? *eyeroll*. Also, you seem to have confused "stop using" with "simulate going back in time and undo the effects of having used it in the past".
Re: (Score:2)
You really are clueless, are you? The only way to remove anything from an LLM is to delete the LLM. Hence they would have to do a full retraining of their models.
Re: (Score:2)
I'll repeat:
You seem to have confused "stop using" with "simulate going back in time and undo the effects of having used it in the past".
Nobody is using their code anymore, except possibly to train new future models. The code does not exist in the models.
Re: (Score:2)
They are continuing to use that data if they use an LLM trained on that data. Try to keep up. Legally, an LLM is just a processing result of its input data. Copyright violations do not go away because you use data derived from the original data you had no business using. Seriously, it is not that hard: You steal the data, you have to stop using _everything_ you made from it when caught.
Re: (Score:3)
(22-claim lawsuit whittled down to just two claims before the trial even starts, including most of the potential damages)
Luddites: "Here's why this is good news for our case!..."
Re: (Score:2)
(Puts down mirror)
Not piracy (Score:2)
Re: (Score:2)
It's not piracy if the output is not verbatim copied from a copyright protected content. Training on "A" and generating "B" is not piracy. And if that was piracy or copyright infringement, then we are all liable for code that even remotely is like a copyrighted code.
If I trace over your painting and color it in differently and then sell it, I'm profiting off of a derivative work of your original copyrighted material, and you would probably win if you sued me. It's yet to be seen whether producing B after training on similar thing B constitutes an legally equivalent situation.
Violation of ToS? (Score:2)
The real test (Score:2)
The real test is to start typing the code of a GPL project and see if it suggests the rest of the copylefted code. If it does, they're caught red-handed.
Re: (Score:2)
Tell me you didn't read the ruling without telling me you didn't read the ruling.
Re: (Score:2)
Bro I didn't even read this comment.