

Anthropic Will Start Training Its AI Models on Chat Transcripts (theverge.com) 19
Anthropic will start training its AI models on user data, including new chat transcripts and coding sessions, unless users choose to opt out. The Verge: It's also extending its data retention policy to five years -- again, for users that don't choose to opt out. All users will have to make a decision by September 28th. For users that click "Accept" now, Anthropic will immediately begin training its models on their data and keeping said data for up to five years, according to a blog post published by Anthropic on Thursday.
The setting applies to "new or resumed chats and coding sessions." Even if you do agree to Anthropic training its AI models on your data, it won't do so with previous chats or coding sessions that you haven't resumed. But if you do continue an old chat or coding session, all bets are off.
The setting applies to "new or resumed chats and coding sessions." Even if you do agree to Anthropic training its AI models on your data, it won't do so with previous chats or coding sessions that you haven't resumed. But if you do continue an old chat or coding session, all bets are off.
Okay. (Score:3, Insightful)
Re: (Score:2)
It's rather suprising. Saving (e.g. for requests from authorities) is to be expected, but for training the user input will often be low quality. For code tasks this may be a different deal, but the typical dumb question with a few typos in it is not worth be end up in a training set. Usually the user writes a short low-quality text and then receives a large chunk from the AI in the typical chatlogs.
Re: (Score:1)
That's fine but... (Score:3)
Re: That's fine but... (Score:2)
It's likely didn't stop them when it came to the original training, or subsequent training, so why should they care now? /s ?
Re: (Score:2)
I don't really see a problem with doing it on the chats on the web.
Except when it's people chatting about their health issues.
Secrets (Score:2)
We run a hook that looks for secrets on push. It takes an admin to fix a false positive; that happens less than once a year. (We have a working population of about 800 engineers committing.)
Presumably OAI would care a lot less about false positives than we do (we don't want to throw away work product; OAI just wants masses of human output), so I expect they could err towards omission, not lose much on false positives and be pretty sure the
Re: (Score:2)
We have a working population of about 800 engineers committing.
And what is the total population of engineers? :)
Reminds me of the old joke: “Q. How many people work here? A. About half of them.”
Re: (Score:3)
Or the other way around: I write GPL2+ code, I use AI. It's trained on my GPL2+ code. Now all of your code assisted by AI is derivative.
Good luck sorting this legal mess.
Re: (Score:2)
Your code was turned into a transformative work that you no longer retain copyright over.
Re: (Score:2)
well, then that should also apply to company "secrets" . can't have it both ways.
Re: (Score:2)
Who wanted it both ways?
Re: (Score:2)
Me when I confused "secrets" with "trade secrets" ... I was arguing about the wrong thing.
Re: (Score:2)
A trade secret can be copyrighted (and you will lose that copyright over a transformative work).
They are additionally protected by statute that specifically protects trade secrets, so copyright and fair use doesn't apply there.
Re: (Score:2)
The interesting point is that a trade secret is protected by other laws than copyright. To stay on topic, for current AI models the overall opinion (not court tested yet) is that the model is not copyrighted as it is no creative human work, but may be a trade secret. If you leak it from your company, they can sue you. If I download the model you shared, they can't sue me, because the download is only affected by copyright, while you leaking it touches the trade secret rights (and your contract probably said
Re: (Score:2)
However, that did not somehow invalidate the infringing of copyright they engaged in when they downloaded pirated books to train the model.
i.e., if you train a model with trade secrets, that model and anything it produces is perfectly legal as far as copyright is concerned. Your acquisition of those trade secrets may have been illegal, though.
Re: (Score:2)
The question if the training is transformative is a different copyright question than the copyright of the model. You have to look at three parts
1) Does the training data need to be licensed?
2) Does the model need to be licensed (as in someone leaks the weights)
3) Does the output need to be licensed?
1) is currently answered "no" because transformative work exceptions apply. That means while 1) is copyrighted, there is a usage right for the AI companies.
3) is currently answered no, denying the (unchanged) ou
Re: (Score:3)
If your code includes secrets, you're doing it wrong. Any secrets should be stored in encrypted storage such as key vaults or password managers. If it's in the code, you're *asking* for a breach.
OpenRouter (Score:2)