Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Social Networks Technology

Reddit CEO Says Microsoft and Others Need To Pay To Search the Site (theverge.com) 78

After striking deals with Google and OpenAI, Reddit CEO Steve Huffman is calling on Microsoft and others to pay if they want to continue scraping the site's data. From a report: "Without these agreements, we don't have any say or knowledge of how our data is displayed and what it's used for, which has put us in a position now of blocking folks who haven't been willing to come to terms with how we'd like our data to be used or not used," Huffman said in an interview this week. He specifically named Microsoft, Anthropic, and Perplexity for refusing to negotiate, saying it has been "a real pain in the ass to block these companies."

Reddit has been escalating its fight against crawlers in recent months. At the beginning of July, its robots.txt file was updated to block web crawlers it doesn't have agreements with. Then people began noticing that Reddit results were only visible in Google results -- where Reddit is paid for its data to be shown -- and not other search engines like Bing. Huffman said that Microsoft has been using Reddit's data to train its AI and summarizing its content in Bing results "without telling us" and that Reddit's data has also been sold through the Bing API to other search engines.

This discussion has been archived. No new comments can be posted.

Reddit CEO Says Microsoft and Others Need To Pay To Search the Site

Comments Filter:
  • Whose data? (Score:5, Insightful)

    by AnotherBlackHat ( 265897 ) on Thursday August 01, 2024 @10:04AM (#64672446) Homepage

    Isn't Reddit user generated content?

    • Re:Whose data? (Score:5, Interesting)

      by mcfatboy93 ( 1363705 ) on Thursday August 01, 2024 @10:09AM (#64672460) Homepage

      but it belongs to Reddit, not the users. Didn't you read the EULA? /s

      After their API change last year I abandoned the site for the Activity Pub side of social media and ended up on Lemmy. Its nice, smaller and feels more like the forums of old, once you filter out all the problematic instances and develop a block list.

      • by Holi ( 250190 )

        I did read their EULA. Apparently you didn't.

        5. By submitting Your Content to the Services, you represent and warrant that you have all rights, power, and authority necessary to grant the rights to Your Content contained within these Terms. Because you alone are responsible for Your Content, you may expose yourself to liability if you post or share Content without all necessary rights.

        You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

        Whe

        • And how is that logically distinct from my snide comment?

          Ownership of the content implies that the owner can do what ever they want with it... so do you own it or do they? The data, the physical representation of your idea in code and bytes is stored on their server, they own it. This is just a long winded way of Reddit saying "lets share it, but its mine".

          • by Holi ( 250190 )

            I can give anyone I want the same license. So If I tell Microsoft they have the right to use my content, they have no reason to pay Reddit for it.

            • Except that is useless unless you also give them the content. They can't get it from Reddit, so what are you going to do, re-post everything to Microsoft?
    • Thinking the data you add to commercial sites is the same as thinking you own the recent games you bought or you own your Windows (or Chrome) PC.

      Internet: welcome to the network of the free and the brave*.

      *limits and mandatory content may apply, no guarantees are provided

      • Re:Whose data? (Score:4, Insightful)

        by SomePoorSchmuck ( 183775 ) on Thursday August 01, 2024 @10:41AM (#64672548) Homepage

        Thinking the data you add to commercial sites is the same as thinking you own the recent games you bought or you own your Windows (or Chrome) PC.

        Internet: welcome to the network of the free and the brave*.

        *limits and mandatory content may apply, no guarantees are provided

        I do think this goes in the bucket with all the swirling contradictions in the discussion about "Are these platforms merely publishers of user content, and therefore have no oversight liability? Or do they control the content and therefore are liable for that content?"

        If my comments become commercial property of the platform, and the platform controls the how/when/if of my content being shown or not shown, stored or not stored, resold or not resold, aggregated into algorithm training sets or not, etc. etc.... if my comments don't remain mine, but become a commercial asset of a business, then that business should be legally responsible for its property.

        • I do think this goes in the bucket with all the swirling contradictions in the discussion about "Are these platforms merely publishers of user content, and therefore have no oversight liability? Or do they control the content and therefore are liable for that content?"

          If my comments become commercial property of the platform, and the platform controls the how/when/if of my content being shown or not shown, stored or not stored, resold or not resold, aggregated into algorithm training sets or not, etc. etc.... if my comments don't remain mine, but become a commercial asset of a business, then that business should be legally responsible for its property.

          willful misunderstanding to push your agenda.

          You approached them and asked to use the service (aka you created an account). The agreement they presented was that you could use their service in return for a non-exclusive, perpetual, worldwide, irrevocable, royalty-free license to use what you post in whatever way they choose. You accepted the agreement.

          You are responsible for what you write. You are the author. You own the copyright. You agreed to give them a license to use it. This is how they are usi

          • Thatâ(TM)s fine, but a perpetual license does not transfer copyright, thus Reddit has no claim to bar other uses.

            • Correct.

              According to the agreement, it is a "non-exclusive" license. This means that the copyright holder (the person who wrote the post) can license the content to others as they wish -but Reddit controls the copy on the Reddit servers and can grant or restrict access to that copy as they choose.

          • willful misunderstanding to push your agenda.

            You approached them and asked to use the service (aka you created an account). The agreement they presented was that you could use their service in return for a non-exclusive, perpetual, worldwide, irrevocable, royalty-free license to use what you post in whatever way they choose. You accepted the agreement.

            You are responsible for what you write. You are the author. You own the copyright. You agreed to give them a license to use it. This is how they are using it.

            Thank you for the clarification. That makes sense to me. I appreciate the insight. Your explanation improves my understanding.
            No thank you for the "willful misunderstanding to push your agenda comment". You have zero knowledge of my internal state, nor my "will", nor my "agenda". It's a discussion forum; I made a comment discussing what I thought should happen. It's unfortunate when people with insight have an aggressively adversarial mentality. It dulls the effective communication of your insights.

      • The only guarantee is your money will be provided.

    • Re:Whose data? (Score:5, Interesting)

      by Rosco P. Coltrane ( 209368 ) on Thursday August 01, 2024 @10:38AM (#64672542)

      Reddit users agreed to forfeit the right to their own content and build Reddit's value for free in exchange for not paying a few dollars a month for the service. It's a choice they made. They can't complain now: it's Reddit's content to monetize as they please.

      But here's a reminder: it may be their content, but the original authors still have edit rights to it, even years down the line. If you want to get back at Reddit, edit your content and stuff it full of nonsense and untruths. You'll lower the value of Reddit and you'll pollute whichever AI trains on your content at the same time. Hint hint...

      Me, I deleted 80% of all my posts and what I left is utter but subtly wrong nonsense.

      • Re: (Score:1, Interesting)

        by conorjh ( 6311812 )

        Me, I deleted 80% of all my posts and what I left is utter but subtly wrong nonsense.

        a true loss for humanity. why in the world anyone thinks posting on a forum and moderating messageboards is "work" and requires rewarding is a delusional first world baby

        • Re:Whose data? (Score:5, Informative)

          by Rosco P. Coltrane ( 209368 ) on Thursday August 01, 2024 @11:13AM (#64672652)

          Small raindrops make big rivers my friend.

          I may be Mister Nobody and whatever I wrote on Reddit over the years may not amount to much, as you're so keen to remind me. But if everybody does what I did, AI will become ever-so-slightly less capable of displacing a human's job, and that job might be yours.

      • by mr_jrt ( 676485 )

        Me, I deleted 80% of all my posts and what I left is utter but subtly wrong nonsense.

        Might want to check they're still deleted. I seem to recall reading that they were restoring deleted posts?

        • Get a script that edits your posts with lorum ipsum or something, and another that deletes them. Execute them a week apart.

          Reddit can't afford to manually review every restored post.

          • Reddit can't afford to manually review every restored post.

            But they can detect "suspicious activity", block your API usage, and revert your content to the last "clean" version from before the "hacking" began. And then change your password for your own protection, and refuse to restore it due to you failing to provide proof you're the owner, since all signals show you're the hacker who hacked the account, not the legitimate account owner. So now your only resource is a small claims court, at which point it's discovered you are indeed the owner. And then their TOS ap

            • That's a really long fantasy.

              Reality: it works until so many people do it they notice (unlikely), and then they attempt a scripted detect-and-recovery but they've cut too much staff for that. Then they half ass it and maybe ban some accounts (which hurts you how?).

              Nobody's going to court or arbitration over it, you'd have to be crazy to think that worth your time.

              • Of course it's fantasy. I'm highlighting the fact there's no winning move. Most anything a prole may think of doing, a lawyers thought before and prevented via TOS. At most people may cause a tiny bit of bad PR, as happened a few months ago, but that lasts a few weeks and then no one care anymore.

          • There's no reason why they wouldn't store every version of every post you've ever written. If it starts to become a problem it will float up towards the top of a report and be noticed. They could store diffs to save space, or compressed copies, and then basically anything less than random bytes will be a non-problem. But even if all they do is rotate the oldest copies out to a warehouse they could probably keep anything you'll upload forever.

      • Re: (Score:2, Interesting)

        by Anonymous Coward

        But here's a reminder: it may be their content, but the original authors still have edit rights to it, even years down the line. If you want to get back at Reddit, edit your content and stuff it full of nonsense and untruths. You'll lower the value of Reddit and you'll pollute whichever AI trains on your content at the same time. Hint hint...

        Have you double checked that from a different IP while logged out?

        After retroactively changing their user agreement, tens of thousands of us that deleted our post history were suspended/ghost banned, and our content restored.
        Some that just happened to also be a moderator were banned over this.

        I deleted my history, got suspended, and two days later my posts were restored for others, and yet at home none of my posts showed up when scrolling through my subs.
        I could see them from work however.

        Reddit users agreed to forfeit the right to their own content and build Reddit's value for free in exchange for not paying a few dollars a month for the service. It's a choice they made.

        Sorry for quoting

        • Have you double checked that from a different IP while logged out?

          Yeah and the content appears to be gone. Of course, it probably isn't truly gone. I'm not deluding myself.

          However, the stuff I kept and polluted on purpose, I do believe it truly was changed. The alternative would be Reddit doubting its own users and archiving every version of every post ever posted, and at some point somehow determining that a user has gone rogue and quietly starting to flag changes as unreliable, and selling only older versions of flagged posts to AI companies.

          I dount Reddit has the resou

      • This is easily solved by legislating that users cannot in fact give up certain rights over their own work. Boom, no more problem with businesses that are providing a forum but not the actual content on that forum claiming excessive rights to everything users put there. The EU already took some big steps in this direction and those laws were aimed squarely at the big social media sites.

      • by Holi ( 250190 )

        Says who? Not Reddit that is for sure.
        This is direct from their EULA

        You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

      • by Holi ( 250190 )

        "Reddit users agreed to forfeit the right to their own content and build Reddit's value for free in exchange for not paying a few dollars a month for the service."

        Can we stop making bullshit claims like that.

        You do not " forfeit the right to their own content", you give reddit a perpetual license for your content but you retain full ownership and copyright of all your posts. The EULA is very clear on that.

    • Re:Whose data? (Score:5, Insightful)

      by dfghjk ( 711126 ) on Thursday August 01, 2024 @10:57AM (#64672598)

      it's reddit content when the topic is profit, it's user generated context when the topic is liability

    • Bandwidth costs money I bet letting big data scraper use at lot of other people's data, if i owned a server I would only want visiters doing what I intended my website be used for, not for some internet Hoover vacuum to come along and slurp up gobs of bandwidth that I paid for
    • User 'submitted' content. What would happen if someone leaked, say, the source to MS Office on reddit, then bing slurps it up, and you can then find it through a bing 'cached' link.

      Obviously that'd be a large cache, but in the old days people used to split files so people could download them in floppy-disk sized chunks, before the ISP cut you off at the 2-hour limit.

      If Reddit gets to sell access to the content that user's submit, what's in it for the users?

  • by FictionPimp ( 712802 ) on Thursday August 01, 2024 @10:12AM (#64672468) Homepage

    The sooner we scrape reddit off the face of the earth the better society will be. Let it die on the worst search engine.

    • Re: (Score:3, Insightful)

      by Magic5Ball ( 188725 )

      This is fine. Google hasn't been my main search engine for most of a decade. Reddit continues to digg its way into irrelevance by making itself more appealing to investors at the expense of users and the rest of the Internet.

    • by CAIMLAS ( 41445 )

      Yep. I've stopped using Google because there are better search engines available (read: which don't index Reddit).

      If I want amateur porn, factually incorrect political invective, and technically incorrect conjecture on other topics, I'll just go to an extended family reunion.

  • Hey! (Score:3, Insightful)

    by doubledown00 ( 2767069 ) on Thursday August 01, 2024 @10:25AM (#64672498)

    "Only we can resell the content we grifted from our users!"

    "Free information". "Royalty Free Innovation." "Profits". It's a beautiful day when Silicon Valley's favorite things all collide into an unsustainable mess.
    Fuck everyone in this story.

  • by Rosco P. Coltrane ( 209368 ) on Thursday August 01, 2024 @10:29AM (#64672510)

    One of the core principles of the web is that linking is free.

    When your business is so desperate that you ask money for linking to your content, it's undeniably proof that your business model is utterly broken.

    Also, if Google agrees to pay up and set a precedent, other platforms will inevitably try to get in on the action. And Google may very well want to do that in fact, because then they'll be the only ones with enough money to index other sites, pushing other, smaller search engines out of the market and entrenching their monopoly forever.

    TL;DR: fuck Spez and fuck Google. In fact, fuck all tech bros who are destroying the hopeful future I was promised as a kid in the 70s. This dystopian shit is beyond depressing.

    • by boulat ( 216724 ) on Thursday August 01, 2024 @10:56AM (#64672590)

      Top Glassdoor reviews for Reddit:

      - "The leadership has no strategic vision and hopes to sit tight and cash out." (in 9 reviews)
      - "Bad management" (in 8 reviews)

    • Ah, but content servers are also free to block unwanted requests. They are in their rights to block the Google crawler.

      If Joe Guy decides to post a link to a reddit page there is STILL nothing reddit can do about it - sort of - reddit can still block requests generated from that referrer URL.
    • by dfghjk ( 711126 )

      Yes, and that core principle means that how data is displayed and what it's used for is up to the consumer. Consider the quote:

      "Without these agreements, we don't have any say or knowledge of how our data is displayed and what it's used for..."

      Yes, just as the web intends. And there is an "agreement", when a user requests data and you provide it, you agree that it be used as the web intends.

      "...which has put us in a position now of blocking folks who haven't been willing to come to terms with how we'd lik

    • > fuck all tech bros who are destroying the hopeful future I was promised as a kid in the 70s. This dystopian shit is beyond depressing.

      Wake up and smell the ashes.

    • One of the core principles of the web is that linking is free.

      Sure, but that has nothing to do with the case at hand. Reddit is asking to be paid for the privilege of crawling the site. If you instead manually link to the site, that's presumably still fine.

  • It's still open to the public, and you can host your own instance as well.
  • Thanks Google for setting a precedent, that will further damage the internet. Now that Redit has successfully made Google pay, how long until others follow?

    We will end up with a situation where you have to use multiple search engines to get a different selection of indexed sites.

  • I only stumbled upon Reddit content via search, so if they want to give up that tiny trickle of revenue I doubt I or they will care. Adios reddit, I barely knew you and I'm OK with that.

  • I only go to Reddit because of the Google search I conduct that suggests Reddit. I would otherwise not think to go to Reddit, nor would I want to on my own volition.
    • This is just saber rattling hoping for some money. If reddit actually blocked google referred visitors their traffic would nosedive into even less profitability.
      • Everyone running Reddit knows their days are numbered. They're just stalling on the inevitable so the right shareholders have time to cash out.
        The majority of popular posts are writing exercises, trolling, account farming, post farming, etc and its been that way for awhile.
        With that track record they know they're not going to survive in the era of chatgpt and the only remaining thing of value after that is a large corpus of pre-LLM text that they can sell off.

        Or they could sell it off except it's on the pu

      • This is not about blocking Google referred visitors. The only way any site can get a Google referred visitor is for Google to have crawled and indexed the site so the links show up in search results.

        This is about forbidding Google from indexing the site in the first place. Not indexed? No search results to be referred from.

  • by rsilvergun ( 571051 ) on Thursday August 01, 2024 @10:52AM (#64672580)
    And search in general. The reason they can get away with this is because the internet is so full crap AI generated content that searching reddit has become the only way to find something that looks like it was made by a human being.
    • I don't follow. What exactly shows how "bad" Google has gotten? I'm having trouble connecting the Reddit asking for money, to Google's search quality.

      • I don't follow. What exactly shows how "bad" Google has gotten? I'm having trouble connecting the Reddit asking for money, to Google's search quality.

        I think they're referencing several articles I've seen recently that say things like "Google searches have gotten so bad, that users are adding 'Reddit' to the end of their search queries in order to get useful results". So, Google searches get so bad that people start using their search engine to search Reddit, now Google has the monopoly on using a search engine that way (to fix the problem they created).

        • If you're right, I still don't follow. If you can get better Reddit search results by using Google to search Reddit, how is that an indicator of Google being "so bad"? That seems to indicate that Google search is at least better than Reddit's own search!

          My personal experience with Google is that I get the result I want as #1 or #2 in the search results, and usually #1. Maybe I just know how to write a search query, I don't know.

  • where will i get my ill-informed drivel from now!?

  • With unmitigated AI content, corporate paywalls, and now this, I fail to see how the internet will survive. RIP internet, 1991-2024.
  • I'm sure that Google and OpenAI can afford to pay what reddit content is worth.... gotta be, what, 10 cents per day?

  • I mean, this is only a forum system boosted with steroids and bullshit, so really, why is nobody had already created a clone using P2P or Blockchain to decentralize the control, and prevents discrimination like this, and most important, render the current Reddit useless and worthless ?

    Like they need to be taken down, what they are doing is toxic and in my opinion, this sound like an incentive to the Microsoft and others, to finance and funds any non-profit that would do exactly that, and take Reddit off the

    • It is trivial to start a new Internet forum. Even a scalable one.

      Now try and make it popular, when there is an established player in the market and the value to users of such sites is the number of existing active users and the daily post count. That is far from trivial and I honestly believe it needs more luck than you can plan for.

      But 'adding blockchain' is just nonsense.

      • That the typical speech from people profiting from Player A, where they do not want others players to exist, not that I'm saying that is your case, but that is defeatism.

        Most today platform where born from an event, like a break point in the need of internet, and I think that we reached this point, let Reddit be the very cause of it slow and agonizing downfall, over the next decade. Who care if it take 10 years from now, let's just get to it, so that one day we can be done.

        We need proper freedom of speech,

  • Reddit CEO Says Microsoft and Others Need To Pay To Search the Site

    All your data are belong to the AI bros.

  • Why yes, I'd rather never see Reddit, Pinterest, LinkedIn, and Quora on my search results.

  • So people who don't use Google search don't find Reddit content? They just de-indexed themselves from everyone else?

    Most people want to be indexed to generate traffic, and the higher the better. Hope the AI deal makes up for all the traffic they won't be getting. Somehow I don't see how that is going to work out like they think.
  • Deserves a repost /s
    --

    Rosco P. Coltrane [slashdot.org]: “One of the core principles of the web is that linking is free.”

    “When your business is so desperate that you ask money for linking to your content, it's undeniably proof that your business model is utterly broken.”

    “Also, if Google agrees to pay up and set a precedent, other platforms will inevitably try to get in on the action. And Google may very well want to do that in fact, because then they'll be the only ones with enough m
  • by Holi ( 250190 ) on Thursday August 01, 2024 @03:22PM (#64673370)

    Looking at Reddit's EULA, they do retain any ownership of the content, they merely have a license. They cannot bar someone else from using your creations, only the copyright owner has that standing.

    5. By submitting Your Content to the Services, you represent and warrant that you have all rights, power, and authority necessary to grant the rights to Your Content contained within these Terms. Because you alone are responsible for Your Content, you may expose yourself to liability if you post or share Content without all necessary rights.

    You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

    When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

  • This made me think, which level AIs browse Slashdot at, and how far back? Is it going to be traumatized by the less wholesome comments?

  • If they allow Bing to crawl Reddit for its search engine, there is nothing to stop Microsoft also feeding that data into their AI and making AI profits off it without giving Reddit any of that money.

8 Catfish = 1 Octo-puss

Working...