Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
The Courts Privacy

OpenAI Loses Fight To Keep ChatGPT Logs Secret In Copyright Case (reuters.com) 39

A federal judge has ordered OpenAI to hand over 20 million anonymized ChatGPT logs in its copyright battle with the New York Times and other outlets. Reuters reports: U.S. Magistrate Judge Ona Wang in a decision made public on Wednesday said that the 20 million logs were relevant to the outlets' claims and that handing them over would not risk violating users' privacy. The judge rejected OpenAI's privacy-related objections to an earlier order requiring the artificial intelligence startup to submit the records as evidence. "There are multiple layers of protection in this case precisely because of the highly sensitive and private nature of much of the discovery," Wang said.

An OpenAI spokesperson on Wednesday cited an earlier blog post from the company's Chief Information Security Officer Dane Stuckey, which said the Times' demand for the chat logs "disregards long-standing privacy protections" and "breaks with common-sense security practices." OpenAI has separately appealed Wang's order to the case's presiding judge, U.S. District Judge Sidney Stein.

A group of newspapers owned by Alden Global Capital's MediaNews Group is also involved in the lawsuit. MediaNews Group executive editor Frank Pine said in a statement on Wednesday that OpenAI's leadership was "hallucinating when they thought they could get away with withholding evidence about how their business model relies on stealing from hardworking journalists."

This discussion has been archived. No new comments can be posted.

OpenAI Loses Fight To Keep ChatGPT Logs Secret In Copyright Case

Comments Filter:
  • by Anonymous Coward

    But what does OpenAI think is permissible in this case? They don't want the courts see the outputs of their software and I imagine they would fight any attempts to see the how they built it. What's left? Just trust their word?

    • Re: (Score:2, Insightful)

      The only decent thing to do is to keep these anonymized. If they become public record every bit of personal information entered into chat GPT will be public knowledge. SSNs. ID card scans. affairs. mental problems. Health problems. There shouldn't even be a question here.
      • by abulafia ( 7826 ) on Wednesday December 03, 2025 @08:39PM (#65834097)
        There is no practical way to do that. Seriously.

        In order to do it properly you'd need to have a process similar to declassification redactions, where a human can reason about real-world context. And you'd need a lot of bodies to do that to 20M chats in any reasonable amount of time.

        "De-identification" automation can sometimes give you a dataset that by itself is anonymized. You really need structured input data for that, though, and the real problem is that there are frequently ways to "enrich" an anonymized dataset by finding other datasets you can join it to.

        And here we're talking about freeform chats with multimodal inputs, those tools really can't cope with that sort of thing.

        Further, the "enrichment" for this sort of thing could be weird. I could theoretically have described a situation to ChatGPT that didn't have identifying names/numbers in it, but that you could recognize, thus outing me. There's no way to redact that sort of thing.

        • by gweihir ( 88907 )

          There is no practical way to do that. Seriously.

          I agree. Well, you cannot get everything out and specific things like, say, SSN or more common health problems, can be blanked out with patterns. But misspell the name of the condition you have or describe it instead of using its name and you are already screwed in most cases. And names, quasi-identifiers of people, etc. are basically impossible to recognize reliably.

          Hence what needs to be done here is also that anybody working on the data needs to be under oath to not leak any personal data and all process

        • There is no practical way to do that. Seriously.

          Sure there is. You just need a 1,000 FBI agents working overtime [house.gov], just like they did with the Epstein files.

        • There is no practical way to do that. Seriously.

          You could use AI to do it. ;)

        • There is no practical way to do that. Seriously.

          Yet these AI companies would be the first to tell other people that their AI can do it easily.

      • The only decent thing to do is to keep these anonymized. If they become public record every bit of personal information entered into chat GPT will be public knowledge. SSNs. ID card scans. affairs. mental problems. Health problems. There shouldn't even be a question here.

        All of the logs would certainly be subject to a court protective order, and everyone involved takes them very seriously. Anyone that got caught publicizing protected information would be in very deep trouble.

      • Or maybe people should be aware that providing such personal information to an LLM is not safe? The statement that it is unsafe is true, regardless of whether people are aware of it. So we might as well make it clear, instead of pretending that user privacy is somewhat respected.

      • I think it's well accepted that that anything you say on-line (outside of a secure encrypted connection) is not private

  • I'm not very AI saavy so this may be a dumb question.

    Why do the logs exist to begin with? Do the ChatGPT algorithms use them to "learn?"

    • by gweihir ( 88907 )

      Training data, issue diagnosis, market research, targeting data for ads, probably to sell it to others at some time.

    • by evanh ( 627108 )

      Because OpenAI doesn't actually care about privacy. They're just using that argument as a smokescreen.

    • by Anonymous Coward

      Why do the logs exist to begin with?

      Everything typed into AI is kept, when you use Copilot to look through things on your Windows PC it's kept, when you dump your companies sales figures in WhateverAI it's kept, when you have a transcription bot sit in your product development meeting it's kept, and with Apple, if they do add personal context your entire screens contents will be transmitted to them and kept.

      Even if you delete the thread, it's kept. If you use the thumbs up button or thumbs down button it'

    • by EvilSS ( 557649 )
      Because the user didn't delete them (before the court order prevented that). If you go on ChatGPT you will see all your old chat sessions on the left. Those are the "logs" they are talking about. Prior to the court order if a user deleted a conversation it was removed (according to OpenAI), but if you didn't then the session persisted so you could go back to it.
    • Why do the logs exist to begin with?

      The court ordered OpenAI to log everything -all input and output- when the NY Times filed the lawsuit. These logs did not exist prior to the court order to begin recording them.

    • by allo ( 1728082 )

      OpenAI was sued to keep storing them. Before they only stored the ones you did not delete (iirc with a clause that they might to RL with them if you are a free user) and guaranteed that they do not have deleted ones. Then the NYT came and sued and the court demanded them to not really delete deleted logs anymore.

  • Such a shame. I think we should be "tough on crime" on these people!

    • The "tough on crime" stance doesn't concern white-collar, billionaire crime. That one falls squarely into the "settlement" or "pardon" category, except in the rare cases that other billionaires were the target of he said crime. Mostly.

  • that you could feed the logs into and have it detect Names, SSNs, Phone numbers and other PII data and replace it with asterisks.

    You know, a tool that's not a real person but has some ability to do seemingly intelligent things. There's a name for it, it's on the tip of my tongue.

    • It is called "a perl one-liner".

      We used to cobble them all the time in the few minutes between important tasks back in the day before the social networking and vibe-coding took over.

      • by allo ( 1728082 )

        Yeah sure ... do you know how hard it is to securely anonymize things? You remove any sensible data, but then people correlate the other data and still have a good guess what's the redacted data. Happens all the time.

        • do you know how hard it is to securely anonymize things?

          Yes, used to do it successfully for years. Neither trivial, nor impossible, just doesn't pay to do it anymore, the populace is not interested in anonymity.

          • by allo ( 1728082 )

            I'd be with you when you say it can work for certain kinds of data, but I think for the general unstructured ChatGPT queries and outputs it's impossible. There are too many subtle hints that neither regular algorithms nor AI can pick up that humans can use to uncloak someone, that you won't get rid of all of them without humans redacting them. And I think the past showed that even humans are not always successful with censoring everything reliable.

            However, the "perl oneliner" is definitely wrong for that ki

  • I want the New York Times to win and set a precedent that feeding copyrighted works into an AI without permission is a copyright violation (regardless of what the AI does with it or what output it generates). Pop the generative AI (aka "slurp up a whole bunch of copyrighted content, mix it around and spit bits of it back out") bubble completely.

  • The idea that a company can get access to all of those chat logs and potentially exploit it for their own personal gain is frightening. I have not asked ChatGPT anything that would embarrass me, but I know people who are asking questions that they would not want out there in the public. I believe that a compromise could be possible, but if The New York Times gets a complete copy of all the chat logs, I would never use their site again. I think that any company that requests this information from the cour

Suburbia is where the developer bulldozes out the trees, then names the streets after them. -- Bill Vaughn

Working...