Forgot your password?
typodupeerror
Wikipedia

Wikipedia Signs AI Licensing Deals On Its 25th Birthday (apnews.com) 51

Wikipedia turns 25 today, and the online encyclopedia is celebrating that with an announcement that it has signed new licensing deals with a slate of major AI companies -- Amazon, Microsoft, Meta Platforms, Perplexity and Mistral AI. The deals allow these companies to access Wikipedia content "at a volume and speed designed specifically for their needs." The Wikimedia Foundation did not disclose financial terms.

Google had already signed on as one of the first enterprise customers back in 2022. The agreements follow the Wikimedia Foundation's push last year for AI developers to pay for access through its enterprise platform. The foundation said human traffic had fallen 8% while bot visits -- sometimes disguised to evade detection -- were heavily taxing its servers.

Wikipedia founder Jimmy Wales said he welcomes AI training on the site's human-curated content but that companies "should probably chip in and pay for your fair share of the cost that you're putting on us." The site remains the ninth most visited on the internet, hosting more than 65 million articles in 300 languages maintained by some 250,000 volunteer editors.
This discussion has been archived. No new comments can be posted.

Wikipedia Signs AI Licensing Deals On Its 25th Birthday

Comments Filter:
  • The people who donated time and money to this half-assery are chumps. And that's before talking about Wikia.
    • Re:Sellouts (Score:5, Insightful)

      by thoriumbr ( 1152281 ) on Thursday January 15, 2026 @11:58AM (#65926380) Homepage
      Servers are expensive, bandwidth is expensive, and Wikipedia draws massive amounts of human and bot traffic. They are a non-profit organization and provide their services for free. AI companies and their bots are hammering down their servers all the time, increase cost. Why should they not charge?

      I donated and keep donating here and there, and I don't mind their new policy. Unless they allow AI bots to pollute Wikipedia with slop, I am fine.
      • by Sloppy ( 14984 )

        My understanding is that the entirety of Wikipedia is only about 60 GB and is conveniently downloadable. Anyone ought to be able to download a local mirror to use, instead of hammering wikipedia's servers, and doing so might be faster for the consumer, anyway.

        And in a world where hundreds of millions of mainstream users stream video, I'm not sure bandwidth really is expensive anymore. To us old-timers, the numbers today are just astonishing. I almost can't believe I used to worry so much about efficiency ..

        • by narcc ( 412956 )

          I almost can't believe I used to worry so much about efficiency .. of .. anything.

          That attitude is one of the primary reasons why software is so bad these days. It's not even about optimization, you can get significantly better performance by just not doing stupid things.

        • My understanding is that the entirety of Wikipedia is only about 60 GB and is conveniently downloadable. Anyone ought to be able to download a local mirror to use, instead of hammering wikipedia's servers, and doing so might be faster for the consumer, anyway.

          The size of Wikipedia **text** is 156 GB [wikipedia.org] but increases to 26 TB when considering all revision history. When also considering Wikimedia, the size balloons to 585 TB [wikipedia.org]. So, the complete repository is far too large for a home user to store.

          AI bot traffic is estimated to be around 65% of all traffic now. However, that traffic is not just incremental, i.e., it doesn't just triple the hardware costs. The vast majority of the repository is in cold storage because it is normally never viewed. However, the AI bot

    • Is Wikipedia not of benefit for the public good? If you donated time (or money) to stroke your own ego that's fine too, but anyone that decides to take their ball and go home after this announcement obviously wasn't in it for the public interest to begin with - to those I say good riddance.
      • by wed128 ( 722152 )
        This seems predicated on the assertion that AI is a public good, and not another means of centralizing power and reducing the workforce. These companies do not have your best interest at heart.
      • I'm not sure why you think your comment has anything to do with AI using Wikipedia's content to help in its mission to essentially destroy the web as an information medium.

        • My comment doesn't apply to AI destruction at all. I was addressing human contributors to Wikipedia that will be offended that their 'work' will be now be used by AI and will want to take their ball (undo their contributions) and go home.
    • Wikipedia is charging AI companies a fee for access, as opposed to the slop companies abusing the servers for free. I really don't see the problem.

  • Wonder how well that will go down with the editors.
    • by allo ( 1728082 )

      The editors know the license they need to use when writing for Wikipedia.

      This is also most likely not about articles. You can download a Wikipedia dump if you want to train on it. There are also datasets with these dumps prepared for training on the usual sites. This is about Wikimedia commons, which is A LOT more data. Rather have a company to pay for a direct download than have an inefficient bot crawling the same content. The license allows both, but the crawler causes more load on the server than allowi

  • by sabbede ( 2678435 ) on Thursday January 15, 2026 @11:34AM (#65926282)
    If the AI companies need to keep checking wikipedia, why not just use some of their massive storage to cache the damn thing and stop hammering the servers? What's with this attitude of "we'd rather check the server a thousand times a second than remember what we just read"?
    • Because that would make their blatant theft of human knowledge be even more obvious copyright infringement, I guess... At least now they're paying for it.

      • Well, it's not theft if it's already free to everyone. But maybe copyrights are the reason they don't cache instead. Since the hosts are annoyed about being hammered by the AI companies, wouldn't it make more sense for the hosts to tell them, "you can cache, but you cannot constantly scrape"?
    • by gweihir ( 88907 )

      Bandwidth is cheaper than storage. Apparently.

      • Well, for the AI companies maybe. They don't bear the costs it places on the hosts. I would think the hosts would have the leverage to force the AI companies to stop if they allowed them to cache instead.
        • by gweihir ( 88907 )

          Indeed. These people are completely without any concern for what damage they do to others. Extreme greed at work or, worse, some "messiah complex".

    • by allo ( 1728082 )

      This is likely about Wikimedia commons (images, videos, etc.) which is a lot more data than the text dumps.

  • Reasonable (Score:5, Insightful)

    by gurps_npc ( 621217 ) on Thursday January 15, 2026 @11:36AM (#65926288) Homepage

    Wikipedia has terms of service that means you give up limited rights when you contribute to it. As such, they decide whether AI gets access to it.

    Being a free service to the general public, it is totally reasonably to charge special users to use it.

    This is a great way to fund the the general public's use, especially considering how the AI community has in general disregarded authors rights. Better to charge them up front.

    • by gweihir ( 88907 )

      Agreed. And without that deal, the slop-makers would just steal everything anyways.

      • by evanh ( 627108 )

        Which the slop-makers already had done of course - With the resulting crawlers creating significant financial burden along the way.

        My guess is the Wiki is now being piped direct as changes happen. Like a rolling live simulcast. Then those crawlers go away and the burden of bandwidth and server costs lift.

        • by gweihir ( 88907 )

          Yes. because the crawlers are completely ignoring all standards of common good Internet behavior and the legal system is incapable of stopping their crap, this is basically self-preservation by at least getting rid of the network load.

    • Being a free service to the general public, it is totally reasonably to charge special users to use it.

      More importantly, AI companies are going to scrape the site regardless of it being allowed or not. What this does is gives them firm legal standing that companies doing so are causing them financial losses.

      • thank you.
        I'm sure they scraped most of Wikipedia already; anybody can go get an offline copy of wikipedia and have been able to do so for many years. It would be the 1st dataset I'd use to train something that large. I've already used it simply to get a list of the most commonly used english words. This is about new content, being hammered by bots, money, and legal issues.

    • Generally speaking, no, it's not a great idea to make a deal with organizations that intend to ensure nobody interacts with your website ever again.

      Wikipedia works because its readers are also its editors, and because its editors expect to have an impact. When you put an AI wall between your content and the readers, nobody visits, nobody has any reason to update a page, and the website dies a death.

      Again, I still don't understand why Slashdotters, of all people, have such difficulty understanding that peopl

  • It's finally happening, people! Surely they'll fork Wikipedia THIS time!
  • by gweihir ( 88907 ) on Thursday January 15, 2026 @11:41AM (#65926302)

    Otherwise the AI pushers would just steal everything and cause damage to availability on top of that.

  • Wikipedia is completely unbiased! It’s crowd-sourced! And the crowd only uses approved sources:

    https://en.wikipedia.org/wiki/... [wikipedia.org]

    With Wikipedia, our AI’s are guaranteed to be benign overlords!

    Sleep well.

    • A fan of "doing your own research" I take it.

      • A fan of "doing your own research" I take it.

        Is that sarcasm? Not a fan of “doing your own research”. We can’t trust the plebes!

        I spent thirty years in high tech - much with direct Bell Labs lineage - and the “do your own research” crowd? Morons.

        Be better!

  • by Meneth ( 872868 ) on Thursday January 15, 2026 @11:53AM (#65926366)
    This seems to be a violation of the CC-BY-SA license granted to Wikipedia by its editors, since the AI companies do not give proper attribution in the output of their models. (Given the vast volume of sources each model has ingested, it would be impractical to do so.)
  • Wikipedia is going to look like reddit soon if they aren't diligent. Wikipedia has a lot of power trip gatekeepers.
  • Aren't we all just sitting around waiting for a handful of people to own every fucking thing on this planet.

  • So, your brain has rotted to the point where you cannot use a browser? Then use AI, so it can rot further. : P
  • by xack ( 5304745 ) on Thursday January 15, 2026 @01:15PM (#65926586)
    I came across Wikipedia very early on, when it had barely 40,000 articles. I was a regular contributor for about two years from 2004-2006 before I gave up because of disputes and became a vandal. Wikipedia is still growing, even though notability limits true growth and that non notable content is routinely monetized by Fandom and Knowyourmeme. Even though I am banned from Wikipedia and am not sorry for vandalising it I'm still impressed what they have built. Wikipedia has made freely available what academic journals and newspapers lock behind paywalls, so I have a respect for that.

    Wikipedia's content is traceable by its history, meanwhile AI just spits out whatever is computed in the LLM algorithm. Elon Musk's disaster of Grokipedia shows what can go wrong when AI tries to make an encyclopedia with a Nazi point of view, Wikipedia is still inherently more trustable.

There is nothing so easy but that it becomes difficult when you do it reluctantly. -- Publius Terentius Afer (Terence)

Working...