OpenAI Loses Fight To Keep ChatGPT Logs Secret In Copyright Case (reuters.com) 39

Posted by BeauHD on Wednesday December 03, 2025 @08:01PM from the fully-revealed dept.

A federal judge has ordered OpenAI to hand over 20 million anonymized ChatGPT logs in its copyright battle with the New York Times and other outlets. Reuters reports: U.S. Magistrate Judge Ona Wang in a decision made public on Wednesday said that the 20 million logs were relevant to the outlets' claims and that handing them over would not risk violating users' privacy. The judge rejected OpenAI's privacy-related objections to an earlier order requiring the artificial intelligence startup to submit the records as evidence. "There are multiple layers of protection in this case precisely because of the highly sensitive and private nature of much of the discovery," Wang said.

An OpenAI spokesperson on Wednesday cited an earlier blog post from the company's Chief Information Security Officer Dane Stuckey, which said the Times' demand for the chat logs "disregards long-standing privacy protections" and "breaks with common-sense security practices." OpenAI has separately appealed Wang's order to the case's presiding judge, U.S. District Judge Sidney Stein.

A group of newspapers owned by Alden Global Capital's MediaNews Group is also involved in the lawsuit. MediaNews Group executive editor Frank Pine said in a statement on Wednesday that OpenAI's leadership was "hallucinating when they thought they could get away with withholding evidence about how their business model relies on stealing from hardworking journalists."

OpenAI Loses Fight To Keep ChatGPT Logs Secret In Copyright Case

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 39 Comments Log In/Create an Account

Comments Filter:

I haven't followed this case too much... (Score:1)

by Anonymous Coward writes:

But what does OpenAI think is permissible in this case? They don't want the courts see the outputs of their software and I imagine they would fight any attempts to see the how they built it. What's left? Just trust their word?
- Re: (Score:2, Insightful)
  
  by Iamthecheese ( 1264298 ) writes:
  
  The only decent thing to do is to keep these anonymized. If they become public record every bit of personal information entered into chat GPT will be public knowledge. SSNs. ID card scans. affairs. mental problems. Health problems. There shouldn't even be a question here.
  - Re:I haven't followed this case too much... (Score:5, Insightful)
    
    by abulafia ( 7826 ) writes: on Wednesday December 03, 2025 @08:39PM (#65834097)
    
    There is no practical way to do that. Seriously.
    In order to do it properly you'd need to have a process similar to declassification redactions, where a human can reason about real-world context. And you'd need a lot of bodies to do that to 20M chats in any reasonable amount of time.
    "De-identification" automation can sometimes give you a dataset that by itself is anonymized. You really need structured input data for that, though, and the real problem is that there are frequently ways to "enrich" an anonymized dataset by finding other datasets you can join it to.
    And here we're talking about freeform chats with multimodal inputs, those tools really can't cope with that sort of thing.
    Further, the "enrichment" for this sort of thing could be weird. I could theoretically have described a situation to ChatGPT that didn't have identifying names/numbers in it, but that you could recognize, thus outing me. There's no way to redact that sort of thing.
    
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      There is no practical way to do that. Seriously.
      I agree. Well, you cannot get everything out and specific things like, say, SSN or more common health problems, can be blanked out with patterns. But misspell the name of the condition you have or describe it instead of using its name and you are already screwed in most cases. And names, quasi-identifiers of people, etc. are basically impossible to recognize reliably.
      Hence what needs to be done here is also that anybody working on the data needs to be under oath to not leak any personal data and all process
    - Re: (Score:2)
      
      by procrastinatos ( 1004262 ) writes:
      
      There is no practical way to do that. Seriously.
      Sure there is. You just need a 1,000 FBI agents working overtime [house.gov], just like they did with the Epstein files.
    - Re: (Score:3)
      
      by JustAnotherOldGuy ( 4145623 ) writes:
      
      There is no practical way to do that. Seriously.
      You could use AI to do it. ;)
    - Re: (Score:3)
      
      by fluffernutter ( 1411889 ) writes:
      
      There is no practical way to do that. Seriously.
      Yet these AI companies would be the first to tell other people that their AI can do it easily.
  - - Re: (Score:3)
      
      by Mr. Dollar Ton ( 5495648 ) writes:
      
      1. Almost everyone.
      2. Yes.
  - Re: (Score:2)
    
    by Dragonslicer ( 991472 ) writes:
    
    The only decent thing to do is to keep these anonymized. If they become public record every bit of personal information entered into chat GPT will be public knowledge. SSNs. ID card scans. affairs. mental problems. Health problems. There shouldn't even be a question here.
    All of the logs would certainly be subject to a court protective order, and everyone involved takes them very seriously. Anyone that got caught publicizing protected information would be in very deep trouble.
  - Re: I haven't followed this case too much... (Score:2)
    
    by toutankh ( 1544253 ) writes:
    
    Or maybe people should be aware that providing such personal information to an LLM is not safe? The statement that it is unsafe is true, regardless of whether people are aware of it. So we might as well make it clear, instead of pretending that user privacy is somewhat respected.
  - Re: (Score:2)
    
    by newbie_fantod ( 514871 ) writes:
    
    I think it's well accepted that that anything you say on-line (outside of a secure encrypted connection) is not private
Why are there logs? (Score:2)

by frdmfghtr ( 603968 ) writes:

I'm not very AI saavy so this may be a dumb question.
Why do the logs exist to begin with? Do the ChatGPT algorithms use them to "learn?"
- Re: (Score:3)
  
  by gweihir ( 88907 ) writes:
  
  Training data, issue diagnosis, market research, targeting data for ads, probably to sell it to others at some time.
- Re: (Score:3)
  
  by evanh ( 627108 ) writes:
  
  Because OpenAI doesn't actually care about privacy. They're just using that argument as a smokescreen.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Why do the logs exist to begin with?
  Everything typed into AI is kept, when you use Copilot to look through things on your Windows PC it's kept, when you dump your companies sales figures in WhateverAI it's kept, when you have a transcription bot sit in your product development meeting it's kept, and with Apple, if they do add personal context your entire screens contents will be transmitted to them and kept.
  Even if you delete the thread, it's kept. If you use the thumbs up button or thumbs down button it'
- Re: (Score:3)
  
  by EvilSS ( 557649 ) writes:
  
  Because the user didn't delete them (before the court order prevented that). If you go on ChatGPT you will see all your old chat sessions on the left. Those are the "logs" they are talking about. Prior to the court order if a user deleted a conversation it was removed (according to OpenAI), but if you didn't then the session persisted so you could go back to it.
- Re: (Score:3)
  
  by Local ID10T ( 790134 ) writes:
  
  Why do the logs exist to begin with?
  The court ordered OpenAI to log everything -all input and output- when the NY Times filed the lawsuit. These logs did not exist prior to the court order to begin recording them.
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  OpenAI was sued to keep storing them. Before they only stored the ones you did not delete (iirc with a clause that they might to RL with them if you are a free user) and guaranteed that they do not have deleted ones. Then the NYT came and sued and the court demanded them to not really delete deleted logs anymore.
Criminals fail to hide the evidence? (Score:2)

by gweihir ( 88907 ) writes:

Such a shame. I think we should be "tough on crime" on these people!
- Re: (Score:2)
  
  by Mr. Dollar Ton ( 5495648 ) writes:
  
  The "tough on crime" stance doesn't concern white-collar, billionaire crime. That one falls squarely into the "settlement" or "pardon" category, except in the rare cases that other billionaires were the target of he said crime. Mostly.
If only there was some kind of "intelligent" tool (Score:2)

by ThomasBHardy ( 827616 ) writes:

that you could feed the logs into and have it detect Names, SSNs, Phone numbers and other PII data and replace it with asterisks.
You know, a tool that's not a real person but has some ability to do seemingly intelligent things. There's a name for it, it's on the tip of my tongue.
- Re: (Score:2)
  
  by Mr. Dollar Ton ( 5495648 ) writes:
  
  It is called "a perl one-liner".
  We used to cobble them all the time in the few minutes between important tasks back in the day before the social networking and vibe-coding took over.
  - Re: (Score:2)
    
    by allo ( 1728082 ) writes:
    
    Yeah sure ... do you know how hard it is to securely anonymize things? You remove any sensible data, but then people correlate the other data and still have a good guess what's the redacted data. Happens all the time.
    - Re: (Score:2)
      
      by Mr. Dollar Ton ( 5495648 ) writes:
      
      do you know how hard it is to securely anonymize things?
      Yes, used to do it successfully for years. Neither trivial, nor impossible, just doesn't pay to do it anymore, the populace is not interested in anonymity.
      - Re: (Score:2)
        
        by allo ( 1728082 ) writes:
        
        I'd be with you when you say it can work for certain kinds of data, but I think for the general unstructured ChatGPT queries and outputs it's impossible. There are too many subtle hints that neither regular algorithms nor AI can pick up that humans can use to uncloak someone, that you won't get rid of all of them without humans redacting them. And I think the past showed that even humans are not always successful with censoring everything reliable.
        However, the "perl oneliner" is definitely wrong for that ki
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  The text that people typed into s black box on the Internet is not their persons, houses, papers, nor effects.
  The Fourth Amendment only makes sense when applied to material people actually try to keep secure. "Diary entries" is a laughable comparison.
  The users sent this data to someone else's computer without even wondering what the company would do with it, let alone reading the terms they were agreeing to.
  Fortunately, there's unlikely to be much criminal evidence in there that could be used against a spec
  - - Re: (Score:2)
      
      by fluffernutter ( 1411889 ) writes:
      
      I thought you may be interested in what copilot has to say:
      
      Short Answer: The Fourth Amendment generally does not protect queries you voluntarily enter into public websites that are logged, because courts often hold that once you share information with a third party, you lose a reasonable expectation of privacy. However, if law enforcement seeks access to those logs, constitutional protections may apply depending on the circumstances and evolving case law.
- Re: (Score:2)
  
  by Dragonslicer ( 991472 ) writes:
  
  Even if you treat the logs as belonging to the users instead of OpenAI, what do you think a court order to produce the logs is, if not a warrant describing the things to be seized?
I want the New York Times to win... (Score:2)

by jonwil ( 467024 ) writes:

I want the New York Times to win and set a precedent that feeding copyrighted works into an AI without permission is a copyright violation (regardless of what the AI does with it or what output it generates). Pop the generative AI (aka "slurp up a whole bunch of copyrighted content, mix it around and spit bits of it back out") bubble completely.
Boycott New York Times (Score:2)

by chiasmus1 ( 654565 ) writes:

The idea that a company can get access to all of those chat logs and potentially exploit it for their own personal gain is frightening. I have not asked ChatGPT anything that would embarrass me, but I know people who are asking questions that they would not want out there in the public. I believe that a compromise could be possible, but if The New York Times gets a complete copy of all the chat logs, I would never use their site again. I think that any company that requests this information from the cour

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

OpenAI Loses Fight To Keep ChatGPT Logs Secret In Copyright Case (reuters.com) 39

OpenAI Loses Fight To Keep ChatGPT Logs Secret In Copyright Case More Login

OpenAI Loses Fight To Keep ChatGPT Logs Secret In Copyright Case

I haven't followed this case too much... (Score:1)

Re: (Score:2, Insightful)

Re:I haven't followed this case too much... (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: I haven't followed this case too much... (Score:2)

Re: (Score:2)

Why are there logs? (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:1)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Criminals fail to hide the evidence? (Score:2)

Re: (Score:2)

If only there was some kind of "intelligent" tool (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

I want the New York Times to win... (Score:2)

Boycott New York Times (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot