Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
AI Privacy

Anthropic Will Start Training Its AI Models on Chat Transcripts (theverge.com) 19

Anthropic will start training its AI models on user data, including new chat transcripts and coding sessions, unless users choose to opt out. The Verge: It's also extending its data retention policy to five years -- again, for users that don't choose to opt out. All users will have to make a decision by September 28th. For users that click "Accept" now, Anthropic will immediately begin training its models on their data and keeping said data for up to five years, according to a blog post published by Anthropic on Thursday.

The setting applies to "new or resumed chats and coding sessions." Even if you do agree to Anthropic training its AI models on your data, it won't do so with previous chats or coding sessions that you haven't resumed. But if you do continue an old chat or coding session, all bets are off.

This discussion has been archived. No new comments can be posted.

Anthropic Will Start Training Its AI Models on Chat Transcripts

Comments Filter:
  • Okay. (Score:3, Insightful)

    by SmaryJerry ( 2759091 ) on Thursday August 28, 2025 @02:41PM (#65622178)
    I figured this was the case already for all models.
    • by allo ( 1728082 )

      It's rather suprising. Saving (e.g. for requests from authorities) is to be expected, but for training the user input will often be low quality. For code tasks this may be a different deal, but the typical dumb question with a few typos in it is not worth be end up in a training set. Usually the user writes a short low-quality text and then receives a large chunk from the AI in the typical chatlogs.

      • by nlc ( 10289693 )
        True but there will also be a lot of more advanced users guiding the AI to better answers over multiple interactions. Ingesting these will help imrpove the quality of the initial response to a 'typical dumb question'. That said, I will not be consenting to carte blanche data retention, I would recommend people keep it disabled and use the feedback buttons for individual chats if they want to help improve future versions of the model.
  • by aldousd666 ( 640240 ) on Thursday August 28, 2025 @03:03PM (#65622242) Journal
    When users are writing their own code, they include secrets. Training on secrets in coding sessions is probably a terrible idea. Maybe they have some way to filter out secrets so they don't go in, but what if they miss something? This seems like a huge problem, at least, training on developer coding sessions. I don't really see a problem with doing it on the chats on the web.
    • It's likely didn't stop them when it came to the original training, or subsequent training, so why should they care now? /s ?

    • by Himmy32 ( 650060 )

      I don't really see a problem with doing it on the chats on the web.

      Except when it's people chatting about their health issues.

    • I don't see that as a huge problem - it isn't that hard to filter.

      We run a hook that looks for secrets on push. It takes an admin to fix a false positive; that happens less than once a year. (We have a working population of about 800 engineers committing.)

      Presumably OAI would care a lot less about false positives than we do (we don't want to throw away work product; OAI just wants masses of human output), so I expect they could err towards omission, not lose much on false positives and be pretty sure the

      • We have a working population of about 800 engineers committing.

        And what is the total population of engineers? :)

        Reminds me of the old joke: “Q. How many people work here? A. About half of them.”

    • Or the other way around: I write GPL2+ code, I use AI. It's trained on my GPL2+ code. Now all of your code assisted by AI is derivative.
      Good luck sorting this legal mess.

      • Easy.

        Your code was turned into a transformative work that you no longer retain copyright over.
        • well, then that should also apply to company "secrets" . can't have it both ways.

          • It does.
            Who wanted it both ways?
            • Me when I confused "secrets" with "trade secrets" ... I was arguing about the wrong thing.

              • Well, trade secrets are a separate issue.
                A trade secret can be copyrighted (and you will lose that copyright over a transformative work).
                They are additionally protected by statute that specifically protects trade secrets, so copyright and fair use doesn't apply there.
                • by allo ( 1728082 )

                  The interesting point is that a trade secret is protected by other laws than copyright. To stay on topic, for current AI models the overall opinion (not court tested yet) is that the model is not copyrighted as it is no creative human work, but may be a trade secret. If you leak it from your company, they can sue you. If I download the model you shared, they can't sue me, because the download is only affected by copyright, while you leaking it touches the trade secret rights (and your contract probably said

                  • The Anthropic class action suit has held that the model is a transformative work- i.e., copyright of the training works is lost.
                    However, that did not somehow invalidate the infringing of copyright they engaged in when they downloaded pirated books to train the model.

                    i.e., if you train a model with trade secrets, that model and anything it produces is perfectly legal as far as copyright is concerned. Your acquisition of those trade secrets may have been illegal, though.
                    • by allo ( 1728082 )

                      The question if the training is transformative is a different copyright question than the copyright of the model. You have to look at three parts

                      1) Does the training data need to be licensed?
                      2) Does the model need to be licensed (as in someone leaks the weights)
                      3) Does the output need to be licensed?

                      1) is currently answered "no" because transformative work exceptions apply. That means while 1) is copyrighted, there is a usage right for the AI companies.

                      3) is currently answered no, denying the (unchanged) ou

    • If your code includes secrets, you're doing it wrong. Any secrets should be stored in encrypted storage such as key vaults or password managers. If it's in the code, you're *asking* for a breach.

  • Not too often does adding a middleman outweigh the extra costs. But having some pretty straight forward settings that prevent using models that don't align to privacy preferences is pretty nice.

What this country needs is a dime that will buy a good five-cent bagel.

Working...