Microsoft Slammed for Building Copyright-Infringing Supercomputer for OpenAI in New Court Filing (arstechnica.com) 86
The New York Times alleges Microsoft actively encouraged OpenAI to steal its copyrighted work, reports Ars Technica, citing a new (and heavily redacted) court filing Thursday:
NYT's motion comes after the [U.S.] Supreme Court sided with Cox Communications in a case where Sony tried and failed to claim that Cox was contributing to music piracy as an Internet service provider, which set a new standard for contributory infringement. Moving forward, plaintiffs will have to prove that parties intentionally acted to induce illegal conduct. Recognizing that the legal precedent has changed, the NYT now wants to amend its complaint to align its contributory infringement claim against Microsoft with that new standard... A Microsoft spokesperson told Ars that the company views the amended complaint as "a last-ditch effort by the plaintiff to save its claim from unfavorable precedent set in other recent rulings..."
The updated complaint seeks to specify that [Microsoft's] supercomputer was tailor-made to help OpenAI infringe and allege that it was built for the explicit purpose of training AI on copyrighted works without permission. And as the NYT alleged, its articles were more heavily weighted by this system, as both firms hoped to train models on the highest-quality journalism possible, so that level of writing could be confidently mimicked in outputs. By building this "unusually complex" machine, Microsoft not only helped select the works that were infringed but also provided a means to seize copyrighted works without permission, the NYT alleged. "Microsoft specifically designed it for the purpose of using essentially the whole Internet — curated to disproportionately feature Times Works — to train the most capable LLM in history," the NYT alleged... Similarly as problematic for the NYT are hallucinations where Microsoft and OpenAI models falsely cite the NYT for content that they never published... "Users who ask a search engine what The Times has written on a subject should be provided with neither an unauthorized copy nor an inaccurate forgery of a Times article, but a link to the article itself," the NYT alleged...
In a statement provided to Ars, OpenAI spokesperson Drew Pusateri reiterated the AI firm's often-repeated claims that AI training on copyrighted works is indisputably fair use... OpenAI has argued that "ChatGPT is not a substitute for a Times subscription," the NYT reported, partly because "they transformed the material for a different use."
An OpenAI spokesperson told Ars Technica that OpenAI's models "empower innovation," while a New York Times spokesperson insisted that Microsoft "actively encouraged OpenAI to steal our copyrighted works... [O]ur core claims remain the same from the day we filed this lawsuit — that Microsoft and OpenAI stole millions of The Times's copyrighted works to compete with our products and illegally enrich themselves."
The article speculates that the case's most extreme outcome "could require OpenAI and Microsoft to wipe models and start over. The NYT has also asked for permanent injunctive relief to prevent future infringement, as well as extensive damages..."
The updated complaint seeks to specify that [Microsoft's] supercomputer was tailor-made to help OpenAI infringe and allege that it was built for the explicit purpose of training AI on copyrighted works without permission. And as the NYT alleged, its articles were more heavily weighted by this system, as both firms hoped to train models on the highest-quality journalism possible, so that level of writing could be confidently mimicked in outputs. By building this "unusually complex" machine, Microsoft not only helped select the works that were infringed but also provided a means to seize copyrighted works without permission, the NYT alleged. "Microsoft specifically designed it for the purpose of using essentially the whole Internet — curated to disproportionately feature Times Works — to train the most capable LLM in history," the NYT alleged... Similarly as problematic for the NYT are hallucinations where Microsoft and OpenAI models falsely cite the NYT for content that they never published... "Users who ask a search engine what The Times has written on a subject should be provided with neither an unauthorized copy nor an inaccurate forgery of a Times article, but a link to the article itself," the NYT alleged...
In a statement provided to Ars, OpenAI spokesperson Drew Pusateri reiterated the AI firm's often-repeated claims that AI training on copyrighted works is indisputably fair use... OpenAI has argued that "ChatGPT is not a substitute for a Times subscription," the NYT reported, partly because "they transformed the material for a different use."
An OpenAI spokesperson told Ars Technica that OpenAI's models "empower innovation," while a New York Times spokesperson insisted that Microsoft "actively encouraged OpenAI to steal our copyrighted works... [O]ur core claims remain the same from the day we filed this lawsuit — that Microsoft and OpenAI stole millions of The Times's copyrighted works to compete with our products and illegally enrich themselves."
The article speculates that the case's most extreme outcome "could require OpenAI and Microsoft to wipe models and start over. The NYT has also asked for permanent injunctive relief to prevent future infringement, as well as extensive damages..."
Innovation (Score:5, Interesting)
Where as the rest of us would be bankrupted and seen some jail time.
Re: (Score:3)
Why does innovation mean companies can steal material and think they can get away with it
Where as the rest of us would be bankrupted and seen some jail time.
Corporations are using the copyright system as it was meant to be used... for them. For the rest of us it's different, as it was designed to be. They get everything they want, we get to pay for everything.
Re: (Score:2)
If they would steal (you mean copyright infringment btw. as the difference is if you still have the original) they would quickly be sued. Currently they seem to act in a niche of law that allows this operation. You may not like it, but up to now no court ruled against this despite several pending processes.
Re: (Score:2)
Re: (Score:2)
Why does innovation mean companies can steal material and think they can get away with it .
Copyright infringement is not theft.
Re: (Score:2)
moonlight sonnata (Score:2)
When will the thousands of commercial recordings of Beethoven's moonlight sonata going to be AI bot reviewed and declared copyright infringements of each other?
Microsoft will just claim, they don't review the training data, are not liable for it, and point to the other company.
Offshore data centers, without any copyright oversight will be used to train on copyright protected text, music, sounds, images or video.
The global fix will be to update the Berne treaty to put all works into the public domain after 5
Re: (Score:3)
Yeah, right. Said without evidence. The best you've got is some correlations.
Re:The suit is nonsense (Score:5, Funny)
Training an AI is exactly the same as training a human mind
I dunno about that... for one thing, most humans don't confidently spout nonsense unless alcohol is involved.
Re: (Score:3)
You've never had a conversation with a politician, have you?
Re: (Score:2)
I did use the qualifier "most". :-)
Re: (Score:2)
I dunno about politicians but Trump as a businessman would certainly qualify.
Re: (Score:2)
Training an AI is exactly the same as training a human mind
I dunno about that... for one thing, most humans don't confidently spout nonsense unless alcohol is involved.
A George Carlin quote comes to mind...
Re: (Score:2)
So you believe that virtually everyone posting here is drunk?
Plausible, I guess, but dude, you've led a very sheltered life.
Re: (Score:2)
I have a feeling that Slashdot is overrun by alcoholics these days.
Re: (Score:2)
Hick ... how do you come to that ... hick ... idea? ...
Belch
Re: (Score:2)
I don't think alcohol is required, but the better of the only two Funny on the story that had large potential for humor.
I am beginning to think the entire AI topic is a joke. Side effect of trying to read What is Thought? by Baum? Or aftereffect of A Thousand Brains by Hawkins. We have no idea how intelligence works, but "borrowing" LOTS of "intelligent" artifacts has become the default starting point?
Me? My new ambition is to become an expert in asking questions that drive AIs crazy. Yes, I'm claiming
Re: (Score:2)
Training an AI is exactly the same as training a human mind
No it isn't. There are huge differences. The inputs are different, the process is different, and the outcome is different.
Why would you say something that's so obviously false?
Re: (Score:2)
Why would you say something that's so obviously false?
Liar or idiot. Take your pick. That person is not even smart enough to ask an Artificial Idiot about this.
Re: (Score:2)
There are huge differences, but it *is* a reasonable analogy. It's certainly more like that than it is like "copying", even though copying, in the extended sense, is involved. As it is in all learning.
Re: (Score:3)
No. Try that deranged statement in a courtroom some time.
Re: (Score:2)
Training an AI is exactly the same as training a human mind
Yeah, I remember when I had to read the entire internet to learn how to talk. I was lucky though, I had access to those resources in preschool.
Some kids weren't able to completely read millions of books in their training data until they were in fourth grade, or even fifth grade. Most of these people have Slashdot IDs between 1423380 and 1423382. It's sad, really, how they talk. The inequality. We should take up a collection pot for them.
My general patience and good will is gone (Score:5, Interesting)
I do not have any faith in the companies of Silicon Valley to have the greater good in mind anymore. It's all about the money so this doesn't surprise me anymore.
Move fast and break things as progressively transitioned into fuck with people and don't give them to a choice to opt out. This ranges from robot-taxis blocking roads to scooters littering streets to AI glasses bringing surveillance so your data can be sold without your consent. Nope, you can't use money anymore so that your previous purchases can be used to sell targeted advertising spots with Google pay and Apple pay.
Silicon Valley needs some more regulation. I no longer give a shit about what new hype machine that have.
PSA; Stop giving money to homeless subscription pan handling. When you pay for a subscription, you just increase the behavior and with it more pan handling. The prices for hardware have gone up because of the fucktards who keep giving money to ChatGPT, Gemini etc. WE WHO DO NOT BUY THESE STUPID SERVICES have to deal with the increased prices because of idiots unable to show restraint. Good job fucking us over chumps.
Re: (Score:3)
I do not have any faith in the companies of Silicon Valley to have the greater good in mind anymore. It's all about the money so this doesn't surprise me anymore.
In the fight between the VCs and the people trying to make the world a better place, the bankers won and took control.
Re: (Score:2)
No matter how good the model is, if you have a one line prompt, it will be no good movie. Or it has to be a very long line. Think of any good movie and try to describe it with one line ... do you think someone could produce it in that quality using only your line?
Re: (Score:2)
Easy: Blade Runner. /me hicks.
I think, two words qualify as one line,
Or not?
A 21st century business plan (Score:2)
What about Sprint/AT&T/Verizon/Dish providing the gateways/routers and optic fibre to carry that pirated data? Hell, what about the Chinese factories that manufactured that infrastructure? This is more about, who they can squeeze money from, than stopping piracy. Intellectual Property owners litigating over piracy-induced "losses", is their new business plan.
Copyright? (Score:5, Funny)
You wouldn't download a supercomputer...
Re: (Score:2)
I presume you mean all the data and software related to a supercomputer, right? Like, sure, you wouldn't "beam down" a supercomputer Star Trek style. But, there are certainly many cases of "hackers" downloading datasets and operating software from so-called "supercomputers" in the news on a regular basis. So yeah, someone might and often do "download supercomputers."
Genie is not going back in the bottle (Score:2)
At this point, isn't AI training data something of a fait acommpli? The models have been trained and exist. No one seriously thinks the courts are going to make these companies toss all the models in the bit bucket. Lawsuits might ring some $ out of some AI companies. It might not. These lawsuits look less and less like slaying giants and more like tilting at windmills.
Re:Genie is not going back in the bottle (Score:5, Informative)
A court could absolutely order them to throw out a model. Perhaps you don't think it's likely to happen, but the law doesn't depend on what you think is likely. The court could also issue an injunction barring them from training future models on copyrighted material without permission. They also could grant damages.
Consider that Anthropic settled a similar case for $1.5 billion, which shows they thought they might lose a lot more if the case went to trial.
Re: (Score:3)
One thing I noticed is Google's AI Overview is really good at answering things you would expect a search engine to know. If I ask Claude something it will hypothesize and maybe make things up, or do a web search maybe, but Google's AI Overview (Gemini?) seems to have faster access to information and knows about recent events. For example I just typed into the chrome search bar "can you tell me about the earthquake in venezuela? I think they found two boys this morning" and it picks up BBC news which is actu
Re: (Score:2)
That's called RAG. Internally it may look like this:
The user asked about a recent earth quake. My tool list says I can access recent events. ... earthquake in Venezuela ... ...
Tool-Call: list-recent-events
Tool-Result:
I got a result about a earth quake in venezuela. This is probably what the user asked for
Answer: I have found information about an earthquake in Venezuela ...
The actual tool can be a database access or in that case a Google search. Or anything else that can retrieve information. In particular t
Possession is 9/10ths of the law (Score:5, Interesting)
Re: (Score:1)
If the truth is about to be uncovered, a corporate attorney hands over a check, a paper admitting no fault, and the recipient typically signs a non-disclosure agreement.
This all stems from a legal system which does not require punishment of crimes. This is supposed to make it more fair because we have the option to let people off when punishment won't actually help anything, but in practice it means that the wealthy can buy their way out of it and the poor cannot, and the judicial system is hard on the poor to make it look like it's strict when in fact it's only strict with the poor. We must require that all known crimes be punished if we are to change our laws, because th
I mean... (Score:3)
If this argument works, then it would also work against all the gun manufacturers for all gun related crime. I doubt the court is going to want to set that precedent.
Re: (Score:2)
Following this analogy, you'd have to explain what it was about the Microsoft system that makes it particularly suited to copyright infringement.
The same argument has been tried for guns and failed miserably. Those evil "military stye assault weapons" are actually designed to be less lethal than civilian ones (Hague Convention of 1899). Ask the German WWI troops which weapon they feared the most.
Re: (Score:2)
It's not about the copyright. Their main argument boils down to "they built and gave us the means and intentions to commit a crime".
Re: (Score:2)
It's not about the copyright.
It sort of is. Microsoft is alleged to have built a "copyright-infringing supercomputer". Issues of fair use aside, that's the only law claimed to be broken. Or potentially broken. There is not yet a law on the books restricting the manufacture, ownership or use of of supercomputers. So that part is right out. If it's the potential use, then 3D printers are illegal, because ghost guns. Legal weed is prohibited due to DWIs. VHS recorders banned to prevent recording off the TV.
This last point may be the gist
Re: (Score:2)
It is not the weapon that kills you, it is the bullet.
A single bullet might be less lethal, unless it hits you at a vital point.
A swarm of bullets is a complete different matter.
Re: (Score:2)
It is not the weapon that kills you, it is the bullet.
Without the gun the bullet is useless.
Re: (Score:2)
That was my point ...
Re: (Score:2)
Hard to tell, since you didn't actually make any points.
Re: (Score:2)
I mad ethe point that the parents idea that modern war weapons are less lethal, is idiotic: as they do not shoot a single bullet, but a salvo.
And a single bullet is equally deadly, regardless how big the caliber is: if it hits an artery or the heart or any other vital point.
Learn to read.
Re: (Score:2)
mad ethe point that the parents idea that modern war weapons are less lethal
Except you didn't really even make that point. A couple of vague sentences is not a point. I will grant you that the parent you were originally replying to is basically retarded though.
Re: (Score:2)
I gave some very good points:
a) the heart does not care when it is pierces if a a weapon is technically considered "less lethal"
b) a weapon that fires a salvo instead of a single shot - is technically "more lethal" than a hand gun
No idea what there is not to comprehend about that.
Re: (Score:2)
a) you've never said this before in this thread
b) technically, wrong
Crap in, crap out (Score:5, Interesting)
Wild off the cuff guess stat. 80% of content being consumed by LLM is untrustworthy, opinion, wrong, brain farts. Just like what I am typing here.
Anyone who believes LLM will lead to Gen AI doesn't get the tech or has an incomplete definition of Gen AI. We really need a new Turing test, we kinda cheated the test with LLM, made a parrot instead of a person.
The principal flaw of LLM AI as a business is that producing content wasn't a problem we needed to solve. We were drowning in the stuff already.
LLMs give a great search and summary feature, but I don't see a way to monetise that with ads like google does without making the results even more dubious. For a subscription model, in my enterprise take up of co-pilot is low, don't know why, If I use my own experience, I've made apps I was told there was a business case for that turned out be very niche, low usage but high value, very specialist tools.
I work with a lot of ambitious go getters, who would I promote? The one who leans on AI to produce some samey looking dross, or the one who can innovate and communicate independently, think on their feet, surprise me and do more with less. If I've got a leader who is dependent on AI, that is a weak, compromised leader.
Then there is a phenomenal trust issue, many just don't trust big tech with sensitive data, and who can blame them, AI companies have no respect for copyright or IP. And the hidden cost, after paying the sky high subs, the cost of your employees labour validating AI answers and patching up flaky results improving the AI product for your competition too. And your employees getting dumber the more they use it.
Nope.
I don't doubt there are niche specialist applications to be exploited in legal and tech to get productivity and quality gains, but specialise and grow your own, don't help your competition by improving the general models . Don't end up dependent on a supplier, look how cloud is biting companies in the ass with fees. Once they get you hooked, they jack up the price. Bad strategy, you don't need more parasites.
Re: (Score:3)
If the goal is trustworthy facts and information, backed by solid reasoning and deduction, you do NOT want something that behaves like the human mind.
And there is the central conceit at the heart of AI, believing a fallible human mind can produce an infallible machine. It looks like another religion trying to invent a god. Nope, tried that many times, it failed. It's another attempt at control.
Put an AI in a car and get a safer, sober driver maybe, after a few pedestrian casualties in Beta testing. Put an AI in charge of the economy, your military or your business and you've got the Wizard of Oz, you don't know who or what is pulling the levers. That's
Re: (Score:2)
Wild off the cuff guess stat. 80% of content being consumed by LLM is untrustworthy, opinion, wrong, brain farts.
This is just numerology masquerading as evidence. When the opening move is “wild off the cuff guess” and the next move is a percentage, that is not data analysis. That is a vibe cosplaying as rational thought. Training data quality is a real problem. So are filtering, curation, retrieval, evals, domain tuning, and human review. “The web contains garbage” does not prove that models cannot extract useful structure from it. Compilers consume source written by humans too; somehow LLVM su
Re: (Score:2)
Good, you took the bait. You can help educate me.
This is just numerology masquerading as evidence. When the opening move is “wild off the cuff guess” and the next move is a percentage, that is not data analysis. That is a vibe cosplaying as rational thought. Training data quality is a real problem. So are filtering, curation, retrieval, evals, domain tuning, and human review. “The web contains garbage” does not prove that models cannot extract useful structure from it. Compilers consume source written by humans too; somehow LLVM survived Stack Overflow.
I agree, but maybe. It's pareto principal [wikipedia.org] in reverse maybe, go figure. 80/20 rule. 20% of content is probably useful and what you describe might help isolate it. Might. Whilst it is unable to differentiate truth from fiction without human refinement, we are far away from that.
Now you are just playing bait-and-switch with definitions. For AGI, say AGI. For generative AI, the ship has already left port. And “parrot or person” is the false dichotomy at the center of the whole performance. LLMs do not need to be conscious citizens with library cards to be useful systems. The question in this thread is not whether ChatGPT has a soul. The question is whether particular copying, training, output, and platform design cross copyright lines. Your parrot/person routine is theatrical fog, and when it's this thick and obvious, it is usually because somebody is trying to obscure their bias.
Agreed, but you don't disprove my point, you are splitting hairs, do you have that saying where you are from. Part of the "AI" whatever the f that is, is invest your money, time, what
Training AI on copyrighted works (Score:2)
* All my own words, without putting it through ClippyAI
Straight to the Paywall (Score:2)
"Users who ask a search engine what The Times has written on a subject should be provided with neither an unauthorized copy nor an inaccurate forgery of a Times article, but a link to the article itself," the NYT alleged...
Only if the question they asked is, "What is a paywall?"
At any rate, the NYT publishes sensationalist garbage just to feed said paywall from the pockets of gullible fools. And you thought AI was unique in it's motivation to maintain such unhealthy user engagement?
Re: (Score:1)
I am a gullible fool willing to pay for your so called "sensationalist garbage". I have this unreasonable desire to support purveyors of my chosen poison - i.e. I would like to continue to provide revenue to my chosen garbage creators. It is imperative therefore that I protect their source of income. I for one, will gladly support their copyright, and their right to monetize "sensationalist garbage". If they have chosen to NOT make it available for free, and have explicitly requested that others not scrape
Re: (Score:2)
I'd agree that AI and search engines in general, should provide links to the original articles. Rather than quoting the content, often without providing proper references.
But that raises the question of how the article came to be in a search engine database in the first place. Someone must have allowed open access to NYT publications for web crawlers. For free, in most cases. With the idea that search engines would drive readers to their content and through those yummy yummy paywalls. So now, they expect m
And you trust them with your data? (Score:2)
I don't see how this is fair (Score:2)
If an individual or individuals commit copyright fraud, the law will financially bury them and probably have prison time. If a corporation commits copyright fraud, they can get away with it.
Just to be pedantic.... (Score:2)
Seeing how copyright cases being fair use or not are IIRC decided on a case by case basis, that's really all that can be said until the litigation runs its course.
NYT just being greedy at this point (Score:2)
While I'm not really a fan of Microsoft and the abomination that their flagship Operating System
has evolved ( devolved ? ) into, I can't say I'm on board with the " Let's blame the manufacturer "
line of thinking in a lame attempt at lawfare for the porpose of easy money.
My thoughts:
So, Microsoft built a custom piece of hardware to a customers design specs ( we don't just have
these things on the shelf at your local SuperComputer Outlet ) and then sold that system to the
aforementioned customer.
How, exactly, i
What is expected Get permission, and pay if asked (Score:1)
If I want to use your personal data to train my LLM, I need to get permission from you. I "pay" for this by either offering giving you a free service (Google, Facebook); or offering you better service in return (Apple, Amazon). For all copyrighted data, I expect to pay for the data.
I can train my AI on open source code that is on GitHub, but if I want to train it on Oracle or SAP software, I need to get their permission, pay them for it, and promise that what I have trained the AI to do cannot be used to re
Models = All copyrighted material (Score:1)