Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
Social Networks The Internet

Reddit Will Block the Internet Archive (theverge.com) 111

Reddit says that it has caught AI companies scraping its data from the Internet Archive's Wayback Machine, so it's going to start blocking the Internet Archive from indexing the vast majority of Reddit. From a report: The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles; instead, it will only be able to index the Reddit.com homepage, which effectively means Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day.

"Internet Archive provides a service to the open web, but we've been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine," spokesperson Tim Rathschmidt tells The Verge.

Reddit Will Block the Internet Archive

Comments Filter:
  • It's because (Score:5, Informative)

    by OverlordQ ( 264228 ) on Monday August 11, 2025 @01:33PM (#65582034) Journal

    They already sold it to google [gizmodo.com]

    • Re: (Score:3, Insightful)

      by brunes69 ( 86786 )

      Which is how it should work.

      If AI companies want to train on data, they should have to pay for it.

      Right now this entire industry is built on IP theft. Its sickening frankly.

      • Yes, but if the AI companies are scraping IA and not reddit, it has zero impact on Reddit besides Reddit being pissy.

        • by dskoll ( 99328 )

          If AI companies are scraping IA, it has no traffic impact on Reddit, but does have a missed-revenue-opportunity impact.

        • Re:It's because (Score:4, Insightful)

          by brunes69 ( 86786 ) <slashdot@@@keirstead...org> on Monday August 11, 2025 @01:46PM (#65582072)

          It has a huge impact because it devalues these kinds of deals and just supports the idea that these companies can run roughshod over IP rights, steal, and pillage to their hearts content without consequence.

          • by allo ( 1728082 )

            It would be good news, if they can't sell user data anymore. If I post on Reddit, then I do it for people to read it, not for Reddit to sell it. And I decide myself if I am offended by AI reading my posts or not.

            • Defenders of reddit rights don't care about your rights, they just want the AI companies to pay up
          • These kinds of deals will only last until the AI hype dies down and the market matures.

            Of course, spammers and russians already know about said deals and throw their own AI slop on reddit, so I question what value it has right now.

            Either way, if this is how reddit intends to become profitable long-term, they're in for a rude awakening.

          • Demonstrated public demand for shorter access delays than current IP law allows suggests the public would be better off with different laws.

            There are many ways to make money from free software etc. None are harmed by downloading it and many benefit.

          • Screw IP rights. Information wants to be free.
            • Here, here! LLMs are 100x smaller than their training sets, they can't possibly memorize them, so what they can do is learn generic compressed patterns. That is precisely what copyright should not protect - abstractions - and instead focus just on expression. LLMs are not even intended to reproduce expression, that is hard and inefficient. It costs money, comes out wrong, and is slow compare to ... copying which is fast, perfect and free. LLMs are precisely fit to be free of copyright issues but somehow peo
          • So am I devaluing by browsing the site?

          • Why are you being a sycophant for these VC backed AI companies?

            • This fight impacts a lot more than VC backed AI companies.
              Why are you being intellectually dishonest?
              • Er...

                Sorry for believing that AI companies shouldn't be able to steal everyone else's work without being compensated.

                Anyone who thinks that's OK is the one being intellectually dishonest

                • by Zagnar ( 722415 )

                  Reading a public website isn't stealing, why is it suddenly stealing when it's done for AI?

            • Why are you being a sycophant for these copyright folks?

          • It is pretty entertaining watching copyright folks fight with AI folks though. It's almost hard to say who I want to win. In a more fair and sustainable world, I would push for AI but since we don't spread out the gains of society, I'm inclined to back the copyright folks since not accessing their content isn't that big a deal, but AI could very well eventually put us out of a job.

          • It has a huge impact because it devalues these kinds of deals and just supports the idea that these companies can run roughshod over IP rights, steal, and pillage to their hearts content without consequence.

            Whoo Hoo! You go, Piratebay!

          • It has a huge impact because it devalues these kinds of deals and just supports the idea that these companies can run roughshod over IP rights, steal, and pillage to their hearts content without consequence.

            Reddit wants to sell your IP rather than let AI have it for free. Reddit did not create it, though. It is two corporate entities trying to cash in on content created by others. I am not sure that there is a clear moral high-ground here.

        • AI companies are scraping IA

          Doesn't that cause some sort of infinite or recursive loop?

      • Re: (Score:3, Interesting)

        by drinkypoo ( 153816 )

        Right now this entire industry is built on IP theft. Its sickening frankly.

        What's more sickening, an industry built on "IP theft" or the term of copyright after it's been extended due to lobbying from media megaconglomerates?

        • Re:It's because (Score:5, Insightful)

          by Mr. Dollar Ton ( 5495648 ) on Monday August 11, 2025 @02:26PM (#65582222)

          Both are a manifestation of the same problem - the power of money to subvert the law. Sometimes big money may be in conflict, but like the tagline from that movie, whoever wins, we lose.

          • In theory, AI could liberate us all. Copyright can never do that for us.

            In practice, your post is 100% spot on.

        • You guys are always wanting us to be more like Europe, because you're a rebel. Disney wanted to import European copyright laws into the US, and that's exactly what happened. How else do you intend to rebel?

        • This guy gets it: IP is an imaginary concept! It should exist only as long as it benefits society ... but our IP laws have been corrupted to only serve the needs of a corporations.

          Pretending that you should keep following made-up rules, that don't benefit anyone except the ultra-rich, as if it was some kind of moral concern, is completely idiotic.

      • Move fast and break things ... laws ... people
      • I remember only five years ago, slashdot had a very counterculture/adversarial view towards intellectual property. Now it seems to be very Jack Valenti.

        You wouldn't download a reddit.

        (But neither would I, the last thing I need is several terabytes worth of the internet's anus.)

        • by davidwr ( 791652 )

          the last thing I need is several terabytes worth of the internet's anus.

          I think you are confusing Reddit with the many n-chans out there. It's an understandable mistake.

        • Fucking seriously.
        • by msauve ( 701917 )
          >I remember only five years ago, slashdot had a very counterculture/adversarial view towards intellectual property. Now it seems to be very Jack Valenti.

          It's the difference between personal use, and corporate profit.
        • We are not any less anti-copyright so much as anti-AI. Copyright isn't going to leave us all in the poor house. You don't need to consume copyright materials. AI on the other hand could very well leave us all worse off, especially given how we operate society. All the gains from AI will surely be used against the people and not for the people.

          • by tepples ( 727027 )

            You don't need to consume copyright materials.

            Until you get to college and you're given a list of required textbooks and required reading in a humanities class. Or until the government, a public utility, or some other monopoly on an essential service requires a particular brand of proprietary operating system to run an application through which to access said service. This could be Windows or macOS on desktop, or Android with Google Play* or iOS on mobile.

            * Though Android Open Source Project is free software, many popular applications require Google Pl

          • You don't need to consume copyright materials.

            *sigh*

            That assumes "the work being copyrighted" is the problem - and not the permission or lack thereof, and whether or not permission is needed, which IMO is dangerous thinking.

            And ignores that anything eligible created in a country where copyright is automatic is copyrighted.

            We really need to shift away from the copyright STATUS being the key focus, otherwise we are gonna risk creating more problems - problems for us, and creators, that benefit the corporations we're concerned about in the first place.

      • by djinn6 ( 1868030 )

        Reminder that the "creativity" being defended by IP laws originates from Reddit users, not the Reddit the company. Most Reddit users are not aware that their posts can be sold for money.

      • It's not how it should work, you lunatic. Reddit owns the servers, to the degree anyone owns post made in public for public reading, it's the people who made them.

        I get that you hate the big AI companies, but think about the consequences of the crazy policies you support to get at them, please. Giving companies like Reddit an ownership stake in anything made after reading posts on it, will not end well.

    • I'm surprised they only got 60 million out of it.

      The one thing I have learned about AI is that whoever controls scrapable data controls the AIs. Because they are useless without massive training sets.

      This means you can open source the models all you want they are basically worthless without the training data sets and those are going to be getting locked up behind paywalls owned and operated by Major platform holders very soon.

      This means that the capital that ai represents, and it is capital just
      • I really wonder how much added value there is in recent data. For a search engine, obviously it needs to be recent - but for any other use... old data is possible as good or better than recent data (especially as new data is going to be polluted with AI-generated content). Maybe search is where all the money lies (Google isn't exactly poor...) and hence the need to scrape endlessly.

  • I sometimes use it to archive especially insightful conversation on reddit. ...yeah it is a rare event, but it does sometimes happen. *casts furtive glance around Slashdot*

  • It took AI to get the Internet to forget things, interesting.

    • There was enormous value to the old internet because the marginal price for access was zero.

      LLMs provide a mechanism to access the same information in a radically more energy-intensive way, which was the missing mechanism to put a price on that value.

      A price tag means the data has to be made into a rivalrous good or you can't sell it. Then the old data has to be made unavailable.

  • Reddit literally sells their data to Google and others for scraping. What Reddit is saying here is that they're blocking The Internet Archive because they aren't paying to scrape that data. Google pays $60 million a year to scrape Reddit for AI data.

    https://www.thedailybeast.com/... [thedailybeast.com]

    • by brunes69 ( 86786 )

      As it should be.

      AI house-of-card companies should not be allowed to engage in rampant IP theft.

      • Bwahaha, that's literally what Reddit does. It steals content from every other source on the internet and profits from it.

        • by allo ( 1728082 )

          But but but ... they can't know what their users post, can they?!
          I mean if they knew that most users do not own the content they post, they would surely delete it ...

        • by _merlin ( 160982 )

          It's just like slashdot, though. It's essentially a site for discussion on linked content submitted by users. Does slashdot also profit from stolen content by your standards?

  • by Sebby ( 238625 ) on Monday August 11, 2025 @01:57PM (#65582110) Journal

    AI is why we can't have nice things.

    • AI is why we can't have nice things.

      Right. What nice things do you think you'd have without AI? (for the record, I'm not a fan of AI, but even less of a fan of moronic statements)

      • The OP is objecting to the loss of the Internet Archive and the ability to review history because of the AI scanning.

    • AI is why we can't have nice things.

      IMO ... that takes away too much agency from people who use AI as a reason to do things - when their doing things (and especially jackassed things, generally speaking at least) is something solely they do / nobody points a gun to their head and makes them do.

      Take the witch hunting of artists over percieved use of AI on Twitter. People actually try to blaim AI for that. Horseshit, that's a choice PEOPLE who levy the accusations make. They CHOOSE to jump to conclusions, only they can choose to do that /

  • Who cares? Reddit is 90% bots and marketing agencies. It's useless. It's AI slop that's been digested and shit back out multiple times...AI trained on the output of AI trained on the output of AI trained on the last vestiges of actual human communication from a forgotten era of Reddit, and all of it designed to push a certain narrative, get you to think a certain way, or make it hard for you to see content they don't want you to see.

  • If a company complains that AI crawlers are causing too much traffic, one might believe it or not (it's not as if Reddit isn't using a CDN for example). But why are they complaining when a mirror they are not hosting themselves is crawled? It's not as if the Internet Archive would crawl them more often when AI bots access the archive.

    • by davidwr ( 791652 )

      I think the ostensible reason has to do with things like deletions. If internet archive gets it, and it's deleted later, it's still at archive.org.

      I suspect the actual reason has to do with money, they want to be the only place that companies willing to pay for content can go to train their AIs, and they want to shut out companies unwilling to pay. Whether this is a good thing, a bad thing, both, or neither is probably a discussion for another time.

      • by allo ( 1728082 )

        Getting back content that is no longer available is the main point of an archive.

  • IP is a government gift, not some natural right though Disney will disagree. IP was intended to facilitate greater good, not rent seeking.

    IP is not a requirement for successful capitalism as nations which disregard it demonstrate. It is not necessary to be competitive but a deliberately conferred, supposedly temporary, market advantage intended to aid progress in the useful arts.

  • I see AI hate regularly on other sites, but it's especially funny seeing the AI Haters come out on Slashdot. You hate technology now? Ridiculous! Don't you still want a household robot to take out the trash, or is that unfashionable now, because it's AI? What a joke AI haters are, especially here. They should pay to scrape data! Don't be ridiculous.

    • hating AI in its current form does not make one a Luddite. This is not about "hating technology", it's about recognizing that what is available now is deeply flawed.

      • hating AI in its current form does not make one a Luddite.

        In a hypothetical universe that doesn't exist, sure, you're not wrong.
        But the AI hatred often overlaps with flat out Luddite tendencies- sacrifice of every drop of intellectual honesty one can find to change the narrative around a technology with no regard for the facts on the ground.

        When I start running into people here who don't like AI, but aren't also engaging in flat out lying in order to prop up their reality bubble, I'll be more inclined to agree with you

        • Well, you ran into me. I dislike AI because of my own experiences with AI producing incorrect and downright misleading information, and its use in places it's clearly not ready for.

    • The fatigue seems to be mostly about other humans who believe LLM's are omniscient and inflatable.

      Or even good conversationalists.

      As an information retrieval tool or a radiology diagnosis assistant, sure, hardly anybody is complaining.

      OK, maybe some lesser radiologists.

  • This is collective punishment, virtue signaling, power-tripping.
  • Maybe they should block the whole internet. They cna take pinterest with them. I get no value out of the fuckwits on reddit
  • reddit is garbage and has been for a long time. The IA can probably just find a new way to archive the site and it's posts.

  • ... this is a way of making sure we can't use the internet archive to look back at what was said at reddit.

    Using the Internet Archive i was able to trace the 2014 Ukrainian Coup forces to be Azov Militants being trained in Ukraine. It's only because of the internet archive that the public still had access to a journalists report while they were being trained - before the coup itself occurred.

    It seems that we're being forced more and more into not being able to check what we were told on day x...

  • Just yesterday there was a news story about
    Predditors organizing to mass copy a YouTuber's content to try to wreck his revenue.

    A court forced Reddit to hand over their identification and he is suing them. Ethan somebody.

    There's quite an ethos over there about organizing crime and apparently if it's leftwing they just leave it alone. e.g. https://www.reddit.com/r/lgbt/... [reddit.com] nobody pushing back against crime.

    Most neutrally the company may not want to deal with subpoenas and they don't think the crimes are jus

  • by ahoffer0 ( 1372847 ) on Monday August 11, 2025 @11:02PM (#65583600)

    I just had a flashback to 2013. Back then, we were criticizing Reddit for monetizing community contributions without compensating the contributors.

    In 2025 we are criticizing AI for monetizing Reddit content without compensating Reddit. Reddit is now the victim, not the victimizer.

  • The joke I was looking for was something about why anyone would want to archive Reddit. I have looked at it (usually based on the google) many times over the years and cannot recall a case of finding any valuable or useful information there. But some of the stuff that was modded up a lot was quite stupid. (Remember the old Laugh-In joke "Interesting, but stupid.")

  • Fuck Reddit. Fuck the horse it rode in on, too.

    They must have needed some publicity otherwise this would have been a unannounced system change.

"Marriage is like a cage; one sees the birds outside desperate to get in, and those inside desperate to get out." -- Montaigne

Working...