Reddit Has Reportedly Signed Over Its Content to Train AI Models (mashable.com) 78
An anonymous reader shared this report from Reuters:
Reddit has signed a contract allowing an AI company to train its models on the social media platform's content, Bloomberg News reported, citing people familiar with the matter... The agreement, signed with an "unnamed large AI company", could be a model for future contracts of a similar nature, Bloomberg reported.
Mashable writes that the move "means that Reddit posts, from the most popular subreddits to the comments of lurkers and small accounts, could build up already-existing LLMs or provide a framework for the next generative AI play." It's a dicey decision from Reddit, as users are already at odds with the business decisions of the nearly 20-year-old platform. Last year, following Reddit's announcement that it would begin charging for access to its APIs, thousands of Reddit forums shut down in protest... This new AI deal could generate even more user ire, as debate rages on about the ethics of using public data, art, and other human-created content to train AI.
Some context from the Verge: The deal, "worth about $60 million on an annualized basis," Bloomberg writes, could still change as the company's plans to go public are still in the works.
Until recently, most AI companies trained their data on the open web without seeking permission. But that's proven to be legally questionable, leading companies to try to get data on firmer footing. It's not known what company Reddit made the deal with, but it's quite a bit more than the $5 million annual deal OpenAI has reportedly been offering news publishers for their data. Apple has also been seeking multi-year deals with major news companies that could be worth "at least $50 million," according to The New York Times.
The news also follows an October story that Reddit had threatened to cut off Google and Bing's search crawlers if it couldn't make a training data deal with AI companies.
Mashable writes that the move "means that Reddit posts, from the most popular subreddits to the comments of lurkers and small accounts, could build up already-existing LLMs or provide a framework for the next generative AI play." It's a dicey decision from Reddit, as users are already at odds with the business decisions of the nearly 20-year-old platform. Last year, following Reddit's announcement that it would begin charging for access to its APIs, thousands of Reddit forums shut down in protest... This new AI deal could generate even more user ire, as debate rages on about the ethics of using public data, art, and other human-created content to train AI.
Some context from the Verge: The deal, "worth about $60 million on an annualized basis," Bloomberg writes, could still change as the company's plans to go public are still in the works.
Until recently, most AI companies trained their data on the open web without seeking permission. But that's proven to be legally questionable, leading companies to try to get data on firmer footing. It's not known what company Reddit made the deal with, but it's quite a bit more than the $5 million annual deal OpenAI has reportedly been offering news publishers for their data. Apple has also been seeking multi-year deals with major news companies that could be worth "at least $50 million," according to The New York Times.
The news also follows an October story that Reddit had threatened to cut off Google and Bing's search crawlers if it couldn't make a training data deal with AI companies.
Re: (Score:1)
So AI is going to transform into a whiney woke libtard like ArchieBunker?
If it uses Reddit data to train, it certainly will.
Second wave (Score:2)
Re:Second wave (Score:5, Insightful)
of people mass-deleting their post history. Foot - Gun.
Do you really think "deleting" your posts would remove it from their storage?
Re: (Score:2)
Re: (Score:2)
Bulk restore n freeze followed in 3... 2... 1....
Re: (Score:1)
Re: Second wave (Score:2)
Re: (Score:2)
Reddit allows you to edit your posts. Or at least it used to, I haven't been there in a few years.
Anyway, the mass delete tools usually have an option to overwrite your posts THEN delete them. I doubt Reddit's keeping post histories across multiple edits, so generally I would expect people who go to the trouble of deleting all their posts are fairly safe from having their content used for AI training.
Re: (Score:2)
Re: (Score:2)
I was going to say... the tools are just macros distributed as browser extensions.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
That ought to reduce the post history by 1-2% or so, you think?
Re: (Score:2)
You misplaced a decimal point. Very few people give a shit. If 0.1% of posts get deleted It'll be mind blowing.
Re: (Score:2)
I daresay there is a large group of those that care that actually have worthwhile posts, rather that caturd, or something.
So while the count may be low the quality of posts gone may be more significant.
Re: (Score:2)
No they don't. People post things online with the expectation that it would be openly readable. Just because someone makes a worthwhile post doesn't mean they'll suddenly throw a tantrum because a computer read it rather than a person. The reality is the people making worthwhile posts are no more nor less privacy conscious than any other group of people. Being good at one thing, doesn't make you an expert in all things, and doesn't make you like minded to anyone else.
The reality is Reddit's ToS has said tha
Re: Second wave (Score:2)
Re: (Score:2)
Ahahahah I forgot about timecube. Thanks!
Re: Second wave (Score:2)
'Some AI companies' (Score:3)
Re: (Score:2)
Isn't the idea of training your AI on Reddit content kind of a silly futile one anyway?
Re:'Some AI companies' (Score:5, Interesting)
Isn't the idea of training your AI on Reddit content kind of a silly futile one anyway?
Actually I was thinking it's a rather scary proposition - an AI trained on reddit posts. Well, at least it's not 4chan or 8chan/8kun I guess...
Re: (Score:2)
I'd be more concerned about it wiping out humans to save us from ourselves.
Re: (Score:2)
Re: (Score:3, Informative)
Dude, PLEASE proofread your posts before hitting Submit. There's a preview button for a reason.
Re: (Score:3)
Dude, PLEASE proofread your posts before hitting Submit. There's a preview button for a reason.
Obviously he's poisoning his posts with misspelling for when Slashdot sells out to AI overlords.
Re: (Score:2)
The AI might start t think lying in all capitals is normal.
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
You can pursue a license. The other side is not required to grant one or set a price you'd like.
That was the plan all along. (Score:2)
This was, of course, why they decided to monetize their API.
Re: (Score:2)
Is /. licensing our answers for AI training? (Score:5, Interesting)
Just curious. If not now, then soon, perhaps.
Re: (Score:2)
It'd be just as valuable as Reddit IMHO although surely less data on Slashdot. Note that I am not saying that training your AI on either Reddit and Slashdot content is a great idea.
Re: (Score:1)
Amazing how a site like 4chan where everyone is anonymous ends up being the most truthful because the words typed and posted have to stand on merit instead of standing on reputation. That's the problem with twitter, facebook, reddit, etc.; they all have ways to assert "expertise" through other means than the posts within the thread.
https://www.youtube.com/watch?... [youtube.com]
IMO that's a good point. Slashdot is kind of like 4chan in that you can be basically anonymous and judged by the content of posts instead of it first passing through cultural pre-conceptions about the author. One could argue the quality of posts might be higher on a pseudo anonymous board rather than one tied to an identity. Though someone could make the opposite case that there's a lot of noise from people who don't know what
Re: (Score:2)
If they did, would it increase, or decrease, the quality of the AI models? Just curious!
Re: (Score:2)
Would it, or would it not lead to more posts about hot grits down my pants on other web sites?
Re: (Score:1)
News Flash! The whole internet is being used for AI training by many, many actors.
Re: (Score:2)
Just curious. If not now, then soon, perhaps.
If they want to train their AI for how to act/make responses while in their Mom's basement, then yes. ;)
Re: (Score:2)
My mom lives on the second floor of her apartment building you inconsiderate bastard!
Re: (Score:3)
Do we really want the content of Reddit to define (Score:5, Insightful)
Re: (Score:2)
It could be worse. They could have used quora.
Or... Twitter.
Re: (Score:2)
They'll all be used, in the end. Anything that has human input, to avoid model collapse.
Re: (Score:1)
Wait long enough and AI will be subliminally training people in what they write. Eventually we reach the Singularity the moron class talks about....a grand Moronic Convergence. Shortly thereafter we will get political ads using AI telling us how by electing Dolt #! we will get the AI Protection Bill and a gonzo-whopper wall to stop AI from invading America. And the governor of Texas will promise to put razor wire on top of it.
Excellent news (Score:3)
Given the level of ignorance, stupidity and bigotry on that site, the AI that is "trained" on this particular dataset will be good for nothing, illogical and reality-denying whiny PoS, swallowing putinist propaganda wholesale.
Re: (Score:1)
And yet, it is the perfect example of the madness that resides in all of us. The web has given that inner voice that is supposed to be kept quiet let loose to roam free.
Re: (Score:2)
Only though our alts and fake accounts.
Re: (Score:3)
It's not that bad there. I'll bet the 4chan post db was cheaper.
Re: (Score:1)
Re: (Score:2)
Yes, I was actually relieved. Big-mouthed and stupid is what I want my degenerative AI to be.
Buncha whores. (Score:2)
Time to Practice "Legalese"? (Score:2)
Might need to attach boilerplate to all my Reddit posts in the future:
"This post is provided 'as is' and any express or implied warranties, including, but not limited to, the implied warranties of fitness for use in 'AI Training,' are disclaimed."
Time to start posting false information & nons (Score:2)
To poison the AI training dataset.
Hint Reddit: since it's legally your content to own and sell as you wish, don't let the stupid smucks who created your content for free edit it and create more, especially when they have a beef against AI.
Re: Time to start posting false information & (Score:1)
Trump and the neo-cons have been doing that for years
It's fine (Score:1)
Nice move (Score:2)
Will being the age of the AI from 0 to 12 at least.
Grabs popcorn. (Score:2)
reddit ai vs 4chan ai (Score:2)
llms trained wit just a dictionary and those message board, and then let them talk it out,,,would be hilarious
Users should demand compensation (Score:2)
If we did (the bulk of) the work for this $60M, how much do I get for having > 250k post and comment karma?
Maybe a class-action lawsuit could give us what we deserve, if OUR content is being monetized. Doesn't that seem fair?
Such accurate content (Score:1)
next from AI.... (Score:1)