Forgot your password?
typodupeerror
Electronic Frontier Foundation AI The Internet The Media

EFF Tells Publishers: Blocking the Internet Archive Won't Stop AI, But It Will Erase The Historical Record (eff.org) 27

"Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper," writes EFF senior policy analyst Joe Mullin.

"That's effectively what's begun happening online in the last few months." The Internet Archive — the world's largest digital library — has preserved newspapers since it went online in the mid-1990s... But in recent months The New York Times began blocking the Archive from crawling its website, using technical measures that go beyond the web's traditional robots.txt rules. That risks cutting off a record that historians and journalists have relied on for decades. Other newspapers, including The Guardian, seem to be following suit...

The Times says the move is driven by concerns about AI companies scraping news content. Publishers seek control over how their work is used, and several — including the Times — are now suing AI companies over whether training models on copyrighted material violates the law. There's a strong case that such training is fair use. Whatever the outcome of those lawsuits, blocking nonprofit archivists is the wrong response.

Organizations like the Internet Archive are not building commercial AI systems. They are preserving a record of our history. Turning off that preservation in an effort to control AI access could essentially torch decades of historical documentation over a fight that libraries like the Archive didn't start, and didn't ask for. If publishers shut the Archive out, they aren't just limiting bots. They're erasing the historical record...

Even if courts place limits on AI training, the law protecting search and web archiving is already well established... There are real disputes over AI training that must be resolved in courts. But sacrificing the public record to fight those battles would be a profound, and possibly irreversible, mistake.

EFF Tells Publishers: Blocking the Internet Archive Won't Stop AI, But It Will Erase The Historical Record

Comments Filter:
  • If the IA agreed to restrict access for some "reasonable" time e.g. 48 hours after publication?
    I would expect to find "back issues" of newspapers in a public library, but I would not expect to find today's paper.

    • That would not satisfy the publishers who don't want their text used EVER.

      Other greedy bastards have given us 95-year copyright terms, remember.
      • And rightfully so, IMHO. The IA has legitimate goals for fair-use, but I don't think that fair use extends to bulk copying of an archive and for profit.

        It's also my belief that the Internet makes it really convenient for outright theft of other intellectual property.

        You can't have it both ways; kleptocracy is evil.

    • by Local ID10T ( 790134 ) <ID10T.L.USER@gmail.com> on Saturday March 21, 2026 @07:48PM (#66053904) Homepage

      I would expect to find "back issues" of newspapers in a public library, but I would not expect to find today's paper.

      Surprise! The public libraries used to get the paper delivered daily. You could, in fact, go to the library and read that days paper. Libraries would have the local paper, as well as major regional and national papers.

      • Re: (Score:2, Funny)

        by PPH ( 736903 )

        What do you mean "used to"?

        They still have the current New York Times. But some homeless bum keeps snatching the comics section before I can get to it.

      • I would imagine they still do have them so long as there is still a paper issue being delivered. Not just public libraries, but I even remember grade school libraries having a copy of the current day's news paper. They loaded it up in this wooden thing daily to keep the pages bound in the proper order since kids would end up fucking that up if they weren't bound.
    • Do you normally have and publically voice opinions about things that you don't know anything about? There needs to be more shaming of this behavior.
  • by umopapisdn69 ( 6522384 ) on Saturday March 21, 2026 @07:59PM (#66053916)

    that even in the best situation the publishers can't trust that IA can effectively stop the AIs from just scraping the content from there? The newspapers perhaps can block AIs from their own sites. But once the data is past their hands they have nothing but license statements for control.

    Mind you I do think there is a fair use case for the AIs. But it's abundantly clear they are perfectly happy to play the "forgiveness is easier than permission" game. As well as "Hey the milk is already spilt, so whatcha gonna do about it?"

    • Extreme measures to prevent scraping are a waste of time if you also produce a print version. Remember Google scanned 40 million books? At the most, it would make it necessary to pay the subscription price for the paper and scan it in.
    • by allo ( 1728082 )

      They can't stop AI crawlers anyway. Other than the IA, the bot makers have a lot of time and money to make them work against the latest protections. You won't stop spam, you won't stop piracy, and you won't stop annoying crawlers. But you can make sending e-mail hard, annoy people trying to use your video service on Linux and kill off archives, because these applications do not have that much money/motivation behind it as spammers, video "pirates" and companies selling data.

  • Scraping by the Internet Archive just happens in the same way as scraping by AI. Their automatic software probably just blocks any scraping.
  • Step 1: Calculate how much revenue will be lost from people not vising the Times due to AI telling them the answer instead of their reading the article itself
    Step 2: Have the AI companies pay the cost of that missing revenue stream.
    Now the Times has a guaranteed revenue stream and can open up web access for the AI companies and Historical archivists to do their job.
    • I'm going to go out on a limb here and say that it's likely against the rules for IA to just buy a subscription from NYC (what is that, $5 a month or something) and then scrape it. Though I don't know why. I'm guessing copyright.

      Going all digital is definitely going to destroy the historical record in the long run. Then "something" will happen, and a lot of stuff will be lost. Akin to the burning of the Library of Alexandria. It's very sad.

"More software projects have gone awry for lack of calendar time than for all other causes combined." -- Fred Brooks, Jr., _The Mythical Man Month_

Working...