Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
The Courts Privacy

OpenAI Loses Fight To Keep ChatGPT Logs Secret In Copyright Case (reuters.com) 28

A federal judge has ordered OpenAI to hand over 20 million anonymized ChatGPT logs in its copyright battle with the New York Times and other outlets. Reuters reports: U.S. Magistrate Judge Ona Wang in a decision made public on Wednesday said that the 20 million logs were relevant to the outlets' claims and that handing them over would not risk violating users' privacy. The judge rejected OpenAI's privacy-related objections to an earlier order requiring the artificial intelligence startup to submit the records as evidence. "There are multiple layers of protection in this case precisely because of the highly sensitive and private nature of much of the discovery," Wang said.

An OpenAI spokesperson on Wednesday cited an earlier blog post from the company's Chief Information Security Officer Dane Stuckey, which said the Times' demand for the chat logs "disregards long-standing privacy protections" and "breaks with common-sense security practices." OpenAI has separately appealed Wang's order to the case's presiding judge, U.S. District Judge Sidney Stein.

A group of newspapers owned by Alden Global Capital's MediaNews Group is also involved in the lawsuit. MediaNews Group executive editor Frank Pine said in a statement on Wednesday that OpenAI's leadership was "hallucinating when they thought they could get away with withholding evidence about how their business model relies on stealing from hardworking journalists."

OpenAI Loses Fight To Keep ChatGPT Logs Secret In Copyright Case

Comments Filter:
  • by Anonymous Coward

    But what does OpenAI think is permissible in this case? They don't want the courts see the outputs of their software and I imagine they would fight any attempts to see the how they built it. What's left? Just trust their word?

    • Re: (Score:2, Insightful)

      The only decent thing to do is to keep these anonymized. If they become public record every bit of personal information entered into chat GPT will be public knowledge. SSNs. ID card scans. affairs. mental problems. Health problems. There shouldn't even be a question here.
      • by abulafia ( 7826 ) on Wednesday December 03, 2025 @08:39PM (#65834097)
        There is no practical way to do that. Seriously.

        In order to do it properly you'd need to have a process similar to declassification redactions, where a human can reason about real-world context. And you'd need a lot of bodies to do that to 20M chats in any reasonable amount of time.

        "De-identification" automation can sometimes give you a dataset that by itself is anonymized. You really need structured input data for that, though, and the real problem is that there are frequently ways to "enrich" an anonymized dataset by finding other datasets you can join it to.

        And here we're talking about freeform chats with multimodal inputs, those tools really can't cope with that sort of thing.

        Further, the "enrichment" for this sort of thing could be weird. I could theoretically have described a situation to ChatGPT that didn't have identifying names/numbers in it, but that you could recognize, thus outing me. There's no way to redact that sort of thing.

        • by gweihir ( 88907 )

          There is no practical way to do that. Seriously.

          I agree. Well, you cannot get everything out and specific things like, say, SSN or more common health problems, can be blanked out with patterns. But misspell the name of the condition you have or describe it instead of using its name and you are already screwed in most cases. And names, quasi-identifiers of people, etc. are basically impossible to recognize reliably.

          Hence what needs to be done here is also that anybody working on the data needs to be under oath to not leak any personal data and all process

        • There is no practical way to do that. Seriously.

          Sure there is. You just need a 1,000 FBI agents working overtime [house.gov], just like they did with the Epstein files.

        • There is no practical way to do that. Seriously.

          You could use AI to do it. ;)

        • There is no practical way to do that. Seriously.

          Yet these AI companies would be the first to tell other people that their AI can do it easily.

      • The only decent thing to do is to keep these anonymized. If they become public record every bit of personal information entered into chat GPT will be public knowledge. SSNs. ID card scans. affairs. mental problems. Health problems. There shouldn't even be a question here.

        All of the logs would certainly be subject to a court protective order, and everyone involved takes them very seriously. Anyone that got caught publicizing protected information would be in very deep trouble.

      • Or maybe people should be aware that providing such personal information to an LLM is not safe? The statement that it is unsafe is true, regardless of whether people are aware of it. So we might as well make it clear, instead of pretending that user privacy is somewhat respected.

  • I'm not very AI saavy so this may be a dumb question.

    Why do the logs exist to begin with? Do the ChatGPT algorithms use them to "learn?"

    • by gweihir ( 88907 )

      Training data, issue diagnosis, market research, targeting data for ads, probably to sell it to others at some time.

    • by evanh ( 627108 )

      Because OpenAI doesn't actually care about privacy. They're just using that argument as a smokescreen.

    • by Anonymous Coward

      Why do the logs exist to begin with?

      Everything typed into AI is kept, when you use Copilot to look through things on your Windows PC it's kept, when you dump your companies sales figures in WhateverAI it's kept, when you have a transcription bot sit in your product development meeting it's kept, and with Apple, if they do add personal context your entire screens contents will be transmitted to them and kept.

      Even if you delete the thread, it's kept. If you use the thumbs up button or thumbs down button it'

    • by EvilSS ( 557649 )
      Because the user didn't delete them (before the court order prevented that). If you go on ChatGPT you will see all your old chat sessions on the left. Those are the "logs" they are talking about. Prior to the court order if a user deleted a conversation it was removed (according to OpenAI), but if you didn't then the session persisted so you could go back to it.
  • Such a shame. I think we should be "tough on crime" on these people!

    • The "tough on crime" stance doesn't concern white-collar, billionaire crime. That one falls squarely into the "settlement" or "pardon" category, except in the rare cases that other billionaires were the target of he said crime. Mostly.

  • that you could feed the logs into and have it detect Names, SSNs, Phone numbers and other PII data and replace it with asterisks.

    You know, a tool that's not a real person but has some ability to do seemingly intelligent things. There's a name for it, it's on the tip of my tongue.

    • It is called "a perl one-liner".

      We used to cobble them all the time in the few minutes between important tasks back in the day before the social networking and vibe-coding took over.

  • I want the New York Times to win and set a precedent that feeding copyrighted works into an AI without permission is a copyright violation (regardless of what the AI does with it or what output it generates). Pop the generative AI (aka "slurp up a whole bunch of copyrighted content, mix it around and spit bits of it back out") bubble completely.

"Though a program be but three lines long, someday it will have to be maintained." -- The Tao of Programming

Working...