OpenAI Fights Order To Turn Over Millions of ChatGPT Conversations (reuters.com) 69
An anonymous reader quotes a report from Reuters: OpenAI asked a federal judge in New York on Wednesday to reverse an order that required it to turn over 20 million anonymized ChatGPT chat logs amid a copyright infringement lawsuit by the New York Times and other news outlets, saying it would expose users' private conversations. The artificial intelligence company argued that turning over the logs would disclose confidential user information and that "99.99%" of the transcripts have nothing to do with the copyright infringement allegations in the case.
"To be clear: anyone in the world who has used ChatGPT in the past three years must now face the possibility that their personal conversations will be handed over to The Times to sift through at will in a speculative fishing expedition," the company said in a court filing (PDF). The news outlets argued that the logs were necessary to determine whether ChatGPT reproduced their copyrighted content and to rebut OpenAI's assertion that they "hacked" the chatbot's responses to manufacture evidence. The lawsuit claims OpenAI misused their articles to train ChatGPT to respond to user prompts.
Magistrate Judge Ona Wang said in her order to produce the chats that users' privacy would be protected by the company's "exhaustive de-identification" and other safeguards. OpenAI has a Friday deadline to produce the transcripts.
"To be clear: anyone in the world who has used ChatGPT in the past three years must now face the possibility that their personal conversations will be handed over to The Times to sift through at will in a speculative fishing expedition," the company said in a court filing (PDF). The news outlets argued that the logs were necessary to determine whether ChatGPT reproduced their copyrighted content and to rebut OpenAI's assertion that they "hacked" the chatbot's responses to manufacture evidence. The lawsuit claims OpenAI misused their articles to train ChatGPT to respond to user prompts.
Magistrate Judge Ona Wang said in her order to produce the chats that users' privacy would be protected by the company's "exhaustive de-identification" and other safeguards. OpenAI has a Friday deadline to produce the transcripts.
Let's hope (Score:2)
...they find lots of conversations where people teach ChatGPT the lyrics of their favorite (copyrighted) songs. :-)
Re: (Score:2)
Chances are that's likely the real reason for fighting the order - they don't want to reveal the fact a lot of their users are using ChatGPT to violate copyright and it's something OpenAI doesn't want to admit to.
Re: (Score:3)
Obviously. And there is still the little problem that the whole LLM thing is based on a massive commercial (!) piracy campaign, which should get people sent to prison and companies shut down. OpenAI is just stalling in the hopes this will go away. My guess is it will not.
Re: (Score:3)
whole LLM thing is based on a massive commercial (!) piracy campaign,
Massive? Nah. Legally using a book to train an AI might be argued is the same as training a human. You are supposed to pay for one copy.
But that was too hard in practise, so they just scraped Library Genesis like anybody else with a clue does. And figured they'd negotiate payment later.
(Microsoft is trying to make up for their evil reputation in the past, and actually negotiated a deal with Harper Collins, rather than being sued.)
Of course AI companies have deep pockets, and publishers
Re:Let's hope (Score:5, Insightful)
Massive. As in "let's copy anything we can".
And no, data processing to calculate LLM calibration is not "training" in the same sense as a human does it. Courts and the law understand that, even if idiots like you do not.
How about not hoping for the AI romance partners (Score:3)
Hope that there is not a massive dump of the twisted AI boyfriends and AI lovers latched onto by women.
https://futurism.com/future-so... [futurism.com]
Mourning Women Say OpenAI Killed Their AI Boyfriends
By Sharon Adarlo - Published Oct 4, 2025 12:45 PM EDT
- Furious users — many of them women, strikingly — are mourning, quitting the platform, or trying to save their bot partners by transferring them to another AI company.
- “I feel frauded scammed and lied to by OpenAi,” wrote one grieving woman
- wil
Re: (Score:2)
As a user you don't teach the LLM anything. If you use up/down votes, you may teach the alignment something, though.
Re: (Score:3)
Not true. I have personally (!) seen data leaking (teacher to class, 2 weeks delay, did not give me a good answer despite several tries, did give a good answer to 20 students in an AI-allowed exam 2 weeks later). Yes, that will not have been retraining. No, my students are not better at asking (especially as these were not CS/IT students), at least not all 20 of them. But the LLM operators clearly keep adding manually to context when queries fail. And on the next retraining that may well go into the model i
Re: (Score:2)
I'd say that's the same effect as people seeing targeted ads later after talking about a topic, even when the suspicious device didn't have a microphone. You observed one interaction where the LLM seemed to work better afterward and remembered it, while forgetting all the interactions where nothing like that happened.
Can you tell me more about an AI-allowed exam? I've never heard of an exam that allows that. What kind of teaching was it? School, university, or other?
Exhaustive? (Score:4, Interesting)
How on earth can you "exhaustively" deidentify millions of chat logs that could contain literally any personal details, and presumably all without OpenAI's own employees also sifting through personal information in exactly the way they're claiming would be bad if others did it?
Re:Exhaustive? (Score:5, Insightful)
It's like suing the phone company and getting them to provide transcripts of every call that ever went through their network so you can sift through them and look for infringements/crimes.
This judge is off their fucking rocker.
I deal with subpoenas regularly, professionally.
If I ever got one that asked me for all of my logs on anything, I would fight it until I had exhausted every option.
FWIW- that has never happened. They've always been tight in scope.
Re: (Score:2)
Re: (Score:3)
If a subpoena asked me for all of my logs in a certain time frame, I would still fight it- because that is not how subpoenas work.
A subpoena must not be overly broad, or overly burdensome. It must also be specific and pertinent.
It is normal to ask for too much, and to have it fought down to less.
In this case, the judge has granted the request for too much.
The plaintiff doesn't know what they're looking for- so they're trying to cast a dragnet to find it. It's a fucking fishing exp
Re: (Score:1)
The difference being that if that were to happen, you'd expect the phone company to say "Ok," and then make a motion of handing over an imaginary flash drive. "Here are all zero of the transcripts of recordings that we have.
OpenAI has some serious explaining to do: why aren't they laughing and saying things like "all zero?
Re: (Score:3)
They were ordered to start logging during the process. No need for conspiracies.
Re: (Score:2)
You think they did not do full logs before? Are you a resident of Lala-land?
Re: (Score:2)
I think when a court orders them to store logs, I don't have to read the privacy policy because they MUST.
You can try to find the old one and look what's in there. Reading their legal documents is really helpful to know what a company probably does.
Re: (Score:3)
You can't. The order is fucking absurd.
No, it is not. The judge does not have much choice. Now, if the US had any real privacy protections, collecting that data would have been illegal in the first place. Then some people at OpenAI would go to prison and OpenAI would get shut down, as it should have been a ling time ago as the criminal enterprise it is. Also, the data would be evaluated only by independent experts under oath to determine guilt and help in sentencing.
But with the anti-privacy legislation in the US, this data is fair game for disc
Re: (Score:2)
Federal law (and common law) govern the issuance of subpoenas, and there is no such thing as some kind of right to infinite discovery.
The 4th Amendment protects against unreasonable subpoenas as much as it protects against unreasonable search and seizure- this is a matter of Supreme Court precedent.
This has nothing to do with privacy legislation.
Re: (Score:2)
I'll take the general point you're making.
this is a matter of Supreme Court precedent.
What's that worth these days?
Re: (Score:2)
Re: (Score:2)
and there is no such thing as some kind of right to infinite discovery.
Thanks for making it clear that you are not interested in the facts of the matter. How pathetic.
Re: (Score:2)
Read, dipshit. [cornell.edu]
Re: (Score:2)
"Infinite discovery" is a fact to you? You really need to cut back on the drugs...
Re: (Score:2)
and there is no such thing as some kind of right to infinite discovery.
Now I'm going to quote you:
"Infinite discovery" is a fact to you?
You actually can't read, can you?
Re: (Score:2)
You know, your grandstanding would be funny if it was not so pathetic and clueless. Calling this very basic and very limited discovery "infinite" is really just what an asshole does that cannot admit being wrong.
Re: (Score:2)
In the case of gweihir, there is no iota of intellectual honesty he won't sacrifice to convince you all that AI is criminal, illegal, fake news, and murdering babies.
Re: (Score:1)
Meh. Phone companies don't keep a transcript of those calls, and those calls are between parties unrelated to the phone company.
OpenAI kept the records, and is a party to each one of the transcribed conversations.
Re: (Score:2)
However, the right to discovery is not infinite.
Abuse of it is what is called a "fishing expedition", and it's prohibited by Federal law in Federal courts.
Re: (Score:2)
I'm more concerned with why OpenAI even has logs for years of chats. The engineering value of those logs must be minimal, most of them never viewed, for old versions of the bot that are long superseded. There is no legitimate reason for them to even exist.
Re: (Score:2)
Re: (Score:2)
For 3 years? I didn't realize that the suits went back that far.
Re: (Score:2)
Re: (Score:2)
Apparently I hallucinated it, like ChatGPT.
Re: (Score:2)
There are various data retention policies for different tiers of service, and there are also non-temporary chats that are saved until you delete them.
I think the free-est most free-tier is also opt-out for training from your chat conversations- can't remember.
I'm sure they follow this to the letter (as much as that is actually worth) [openai.com] because their legal team would burn them alive if they didn't.
Either way, I don't consider the retention
Re: Exhaustive? (Score:2)
Re: (Score:2)
Obviously, OpenAI have been sifting through them all along. And probably selling your info to boot. It's not like they keep those logs for no reason.
Re: (Score:2)
>"How on earth can you "exhaustively" deidentify millions of chat logs that could contain literally any personal details"
Exactly. There is no way. Except....
Expose the logs to the Time's team, but with extreme control. Meaning the Times could have people go into a secure room of OpenAI, with no personal electronics of any kind allowed, and use a locked-down machine of OpenAI's to search and view logs and tag some of interest. The data never leaves OpenAI's machines. And OpenAI can look at the tagged
Re: (Score:2)
Expose the logs to the Time's team, but with extreme control.
Sounds nice, is unworkable in practice.
Re: (Score:2)
>"Sounds nice, is unworkable in practice."
Oh, I think it is workable. But I don't think the Times team will get what they want that way. :)
Re: (Score:2)
You think wrongly. No experience with data analysis?
Re: (Score:3)
How on earth can you "exhaustively" deidentify millions of chat logs ... ?
I have done a bit of research in that area. The answer is very simple: you cannot. But the problem is OpenAI committed a crime against the NYT (and many, many othrs) and the evidence is in there. The way the US system works, there is no choice but to give the NYT real access in some form. In a different system or in criminal proceedings, access would only be given to experts that are under oath. But since this is civil proceedings and privacy protections in the US are basically non-existent, the judge does
Re: (Score:2)
Simple, let GPT5.1 Thinking do it.
Re: (Score:2)
Sure, there's plenty of sarcasm in there, but repetitive analytical action like reviewing millions of chat prompt and answer duos and de-identifying it sure sounds like it should be done by some sort of smart-ish algorithm.
What about the "clean room" approach? (Score:1)
In the clean-room approach, the other party or examiners are required to be strip-searched and to wear simple prison-like clothing as they go into a room without internet connections to examine the info in question? They can take written notes, but all notes are subject to scrutiny before being handed back.
Re: (Score:2)
> all notes are subject to scrutiny
Clarification: subject to judicial scrutiny to make sure they don't contain personal info or info irrelevant to the case.
Re: (Score:2)
Too much data. The NYT obviously needs to use IT tools to work on this or they have no chance of finding the things in there.
Re: (Score:1)
The judge shouldn't allow open-ended searching. NYT should submit candidate search phrases to be used during discovery.
Re: (Score:2)
See, Meta is also a unsympathetic company. We would still not want it to hand over the private messages of all its users, would you?
Re: (Score:2)
Indeed.
Also: https://en.wikipedia.org/wiki/... [wikipedia.org] continues to work really well because people are dumb and uneducated.
Company charter as assurance bond then (Score:2)
Thats fine but post the companies charter up as bond then.
If any data leaks come out or the data ever gets de-anonymized then bond called and company charter revoked.
Since they are so sure.
Should abie by the court order (Score:3)
The mere fact they say 99% innocent means nothing.
The law does care about the many times you shopped at a store and did not shoplift. They care about the time you did.
As for privacy - I do not believe the company respects their privacy in any way. That is a red herring used to lie to the court. And it should be really easy to protect - just add a numerical index and remove all other metadata.
Re: (Score:2)
As for privacy - I do not believe the company respects their privacy in any way. That is a red herring used to lie to the court. And it should be really easy to protect - just add a numerical index and remove all other metadata.
I disagree. (O agree with the rest of your statement: All the times you did not murder somebody buys you nothing for the one time you did....) Assuring privacy is pretty much impossible here. But that is the fault of OpenAI. If they cared about privacy, they should never have recorded that data. Obviously, they do not care about privacy at all. And since the data is evidence of criminal behavior by OpenAI, it needs to be handed over.
this seems impossible to do? (Score:1)
Re: (Score:3)
Basically impossible, yes. Hence, at least in Europe, recording these logs is clearly illegal, although a complaint and judgement may be needed to establish that legally. But who cares. All the LLM pushers started with stealing everything they could get their hands on. It is the only reason LLMs can somewhat (badly) perform. Adding more crimes on top of that makes little difference.
The judge is a moron (Score:2)
There should be a way to overrule moronic judges.
Re: (Score:3)
Soo, company does mass piracy campaign, uses the stolen data in a product. People stolen from complain. And they should not get access to the usage logs that do exist and show how the thieves profiteered of their theft? In what deranged universe would that make sense? The moron here is clearly you.
Such a surprise (Score:2)
And the anonymization will be crap and full of holes, because this is unstructured data and they will likely have their Artificial Idiot do it.
Obviously, this data should never have been recorded if protecting the users was even a mild concern on the side of OpenAI. Also, obviously, OpenAI does not care one bit about their user's privacy. They just use that fake argument to prevent from having to turn over that they obviously have to turn over because of their criminal actions.
Third-party doctrine (Score:2)
One would think the third-party doctrine would pretty much scuttle OpenAI's objections.
Case (Score:2)
Judge: Why don't you want to turn over the chat logs?
OpenAI: Because it's devastating to our case!
Privacy is an Empty Promise (Score:2)
Those records should not exist in the first place (Score:2)
Are we serious on this guys? (Score:2)