Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Technology

Reddit Wants To Get Paid for Helping To Teach Big AI Systems (nytimes.com) 46

Reddit has long been a forum for discussion on a huge variety of topics, and companies like Google and OpenAI have been using it in their A.I. projects. From a report: Reddit has long been a hot spot for conversation on the internet. About 57 million people visit the site every day to chat about topics as varied as makeup, video games and pointers for power washing driveways. In recent years, Reddit's array of chats also have been a free teaching aid for companies like Google, OpenAI and Microsoft. Those companies are using Reddit's conversations in the development of giant artificial intelligence systems that many in Silicon Valley think are on their way to becoming the tech industry's next big thing. Now Reddit wants to be paid for it.

The company said on Tuesday that it planned to begin charging companies for access to its application programming interface, or A.P.I., the method through which outside entities can download and process the social network's vast selection of person-to-person conversations. "The Reddit corpus of data is really valuable," Steve Huffman, founder and chief executive of Reddit, said in an interview. "But we don't need to give all of that value to some of the largest companies in the world for free." The move marks one of the first significant examples of a social network's charging for access to the conversations it hosts for the purpose of developing A.I. systems like ChatGPT, OpenAI's popular program. Those new A.I. systems could one day lead to big businesses, but they aren't likely to help companies like Reddit very much. In fact, they could be used to create competitors -- automated duplicates to Reddit's conversations.

This discussion has been archived. No new comments can be posted.

Reddit Wants To Get Paid for Helping To Teach Big AI Systems

Comments Filter:
  • Reddit Problems (Score:5, Insightful)

    by sizzlinkitty ( 1199479 ) on Tuesday April 18, 2023 @01:14PM (#63459530)

    Reddit needs to fix their moderator problems before worrying about how their data is being used in openai. Many of their biggest communities are moderated by children or adults with a childish attitude. Reddit needs to pay their moderators and vet them accordingly with a transparency system that allows average users to see actions taken and by which moderator.

    • Re:Reddit Problems (Score:5, Informative)

      by greytree ( 7124971 ) on Tuesday April 18, 2023 @01:23PM (#63459558)
      They also need an appeal system so mods cannot have you banned from the whole site without answering to someone who will ensure that they enforce the rules correctly, fairly and impartially.
    • by KlomDark ( 6370 )

      I cant stand to go there anymore. I got banned from my hometown subreddit for who knows what, something "bad". Fuck that shit. Total lack of transparency.

    • what you complain about is a different issue.
      May be substantial, but having nothing to do with whether they should allow free access to the content.

    • That would require Reddit to pay for a professional moderator team. If you expect the kind of terminally online people who actually have the time and desire to do that work to actually to do a good job, you may want to check to see if they have any bridges to sell you. They may as well hang up a sign that says "Petty, power hungry idiots wanted" because that's what a position like that will attract.
    • by ljw1004 ( 764174 )

      Reddit needs to fix their moderator problems before worrying about how their data is being used in openai.

      Why "before"?

      I mean, I get that they need to fix their moderator problem. But your claim is that this should happen BEFORE they worry about how their data is used for AI-training, and I don't see why one should come before the other?

    • by Kisai ( 213879 )

      Sorry what? A free-to-post-on forum wants to be paid, while not owning the content on it? Sounds like Reddit wants to be paid for the same reason OpenAI wants to be paid.

      Let's just put all the cards on the table. If Reddit isn't paying it's contributors and moderators, then it doesn't have a leg to stand on to ask GPT developers.

      It's somewhat more credible to have AI developers pay Wikipedia, because there is a legal standing there that all content on Wikipedia is public domain, and paying Wikipedia keeps W

      • by pacinpm ( 631330 )

        Sorry what? A free-to-post-on forum wants to be paid, while not owning the content on it?

        I am pretty sure they OWN content on it. I didn't read their regulations but expect to find there a clause giving them all rights to posted content.

  • by greytree ( 7124971 ) on Tuesday April 18, 2023 @01:18PM (#63459548)
    Person > Hello. Why was my comment removed?
    AI > Hello. Your comment has been removed and you have been banned from Reddit for insulting someone in my group.
    Person > I replied to someone insulting me. They have not been banned and their insult is still visible.
    AI > No insults, you're banned.
    Person > Can I appeal?
    AI > You can fill in a form that goes to /dev/null. Goodbye.

    Well done AI Reddit Mod, you passed the Turing Test. You are indistinguishable from human Reddit Mods.
    • by KlomDark ( 6370 )

      Exactly... What a shithole.

    • Fake news, I was banned from the entire site and then i got: "Thanks for submitting an appeal to the Reddit admin team. We have reviewed your request and have lifted your suspension." with my 1 karma 0 post account
  • Oh hell no (Score:5, Insightful)

    by alvinrod ( 889928 ) on Tuesday April 18, 2023 @01:29PM (#63459588)
    I can't think of a surer way to create an AI with a psychotic hatred of humanity than to use Reddit to train it. Whatever comes out of that project is going to make AM look like Gandhi by comparison.
    • by WDot ( 1286728 )
      Part of the reason Reddit is used to train AI is that lots of people post links, and dataset scripts will scrape the content of the links and not just the comments themselves. Even better, you can set a threshold like “comment must have +3 karma for us to follow the link.” This is literally how GPT-2 was trained: https://d4mucfpksywv.cloudfron... [cloudfront.net]
  • Along with Reddit they could dig up old emo Myspace pages. Pretty much the same depressed about everything mentality.
  • ChatGPT is no more AI than the chatbot pretending to offer customer service on Amazon. The only difference is that ChatGPT shamelessly steals content without attribution or payment to creators. I for one am looking forward to the class action lawsuits.

    • The only difference is that ChatGPT shamelessly steals content without attribution or payment to creators

      Should fit right in on a pro-piracy site.

    • So, Reddit should be free if you're reading it to learn more words. Free if you're learning how people write. Free if you want to learn how to make money and talk about making money. Free if you want to learn how to be better at your job. Free if you want to see if your employees are doing their job instead of bumming around on Reddit. Free if you're stalking an ex. Free if you're trying to write better software...

      Unless it's a specific kind of software, then you have to pay.

      Really?

  • If Reddit start using their content as a source of income do they not have to ensure it is of merchantable quality?

    I'm thinking of not just accuracy and factual correctness, but being free of bias or libellous statements. It seems to me that AIs from rich and large organisations will become soft targets for the more predatory kind of lawyer. And if someone could demonstrate that the source of a contentious statement was sold to an AI trainer, that trainer would be more than willing to at least try to pass

  • I'm waiting for artists or some big monied IP holder to start a class action suit against a lot of the generative AI platforms.

  • If Reddit can get in on this action, what about Slashdot? I think the conversations here would be at least equally useful in training AI. For that matter, there are probably lots of other Web forums that LLM's would benefit from. Why single out Reddit?

  • It is worse than slashdot for circlejerk bullshit. You can't say anything independent there, therefore its content/data is useless. (Unless you're writing a bot to imitate the current reddit circlejerk, whatever that is.)
  • I'm pretty sure we don't need ChatGPT and other LLMs to spew lame repetitive references and autism.
  • by Walt Dismal ( 534799 ) on Tuesday April 18, 2023 @06:10PM (#63460280)

    This is like a toilet charging you for its clogging and overflowing. Years ago my longterm account was banned for unsupportable reasons, and I never went back. We know Reddit like Twitter was linked to government political activism. Choosing to train AIs on Reddit material introduces avoidable biases in the bots.

    We need to shift to properly curated source material, maybe books that have been around and accepted for a long time.

    Although that could lead to problems too:

    "ChatGPT, recommend a healthy dinner menu for me.

    Bot: "Green eggs and ham!"

    Me: No.

    Bot: "Diet of Worms!"

    Me: lays head down on desk.

  • If they based ChatGPT on SO and the official documentation only, I'd be far more willing to use it. /r/count already caused weird glitch tokens. If it's dependent on Reddit so much, then no wonder ChatGPT is such a self-confident bullshitter.

    Reddit has some occasional diamond posts. But the upvote /downvote metric makes the site into a purile popularity contest that swamps and overlooks anything that would add actual value to a machine learning data set. Extracting and validating the rare example of usef

  • If you want the world to be able to read your content, even without creating an account, great, but don't expect to be able to get some people to pay.

Computer programmers do it byte by byte.

Working...