EFF Tells Publishers: Blocking the Internet Archive Won't Stop AI, But It Will Erase The Historical Record (eff.org) 27

Posted by EditorDavid on Saturday March 21, 2026 @06:38PM from the slowing-to-a-crawler dept.

"Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper," writes EFF senior policy analyst Joe Mullin.

"That's effectively what's begun happening online in the last few months." The Internet Archive — the world's largest digital library — has preserved newspapers since it went online in the mid-1990s... But in recent months The New York Times began blocking the Archive from crawling its website, using technical measures that go beyond the web's traditional robots.txt rules. That risks cutting off a record that historians and journalists have relied on for decades. Other newspapers, including The Guardian, seem to be following suit...

The Times says the move is driven by concerns about AI companies scraping news content. Publishers seek control over how their work is used, and several — including the Times — are now suing AI companies over whether training models on copyrighted material violates the law. There's a strong case that such training is fair use. Whatever the outcome of those lawsuits, blocking nonprofit archivists is the wrong response.

Organizations like the Internet Archive are not building commercial AI systems. They are preserving a record of our history. Turning off that preservation in an effort to control AI access could essentially torch decades of historical documentation over a fight that libraries like the Archive didn't start, and didn't ask for. If publishers shut the Archive out, they aren't just limiting bots. They're erasing the historical record...

Even if courts place limits on AI training, the law protecting search and web archiving is already well established... There are real disputes over AI training that must be resolved in courts. But sacrificing the public record to fight those battles would be a profound, and possibly irreversible, mistake.

EFF Tells Publishers: Blocking the Internet Archive Won't Stop AI, But It Will Erase The Historical Record

Post Load All Comments

Search 27 Comments Log In/Create an Account

Comments Filter:

Could this all be solved (Score:1)

by JimBowen ( 885772 ) writes:

If the IA agreed to restrict access for some "reasonable" time e.g. 48 hours after publication?
I would expect to find "back issues" of newspapers in a public library, but I would not expect to find today's paper.
- Re: (Score:2)
  
  by greytree ( 7124971 ) writes:
  
  That would not satisfy the publishers who don't want their text used EVER.
  
  Other greedy bastards have given us 95-year copyright terms, remember.
  - Re: (Score:2)
    
    by postbigbang ( 761081 ) writes:
    
    And rightfully so, IMHO. The IA has legitimate goals for fair-use, but I don't think that fair use extends to bulk copying of an archive and for profit.
    It's also my belief that the Internet makes it really convenient for outright theft of other intellectual property.
    You can't have it both ways; kleptocracy is evil.
- Re:Could this all be solved (Score:5, Informative)
  
  by Local ID10T ( 790134 ) writes: <ID10T.L.USER@gmail.com> on Saturday March 21, 2026 @07:48PM (#66053904) Homepage
  
  I would expect to find "back issues" of newspapers in a public library, but I would not expect to find today's paper.
  Surprise! The public libraries used to get the paper delivered daily. You could, in fact, go to the library and read that days paper. Libraries would have the local paper, as well as major regional and national papers.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: (Score:2, Funny)
    
    by PPH ( 736903 ) writes:
    
    What do you mean "used to"?
    They still have the current New York Times. But some homeless bum keeps snatching the comics section before I can get to it.
    - - Re: (Score:2)
        
        by PPH ( 736903 ) writes:
        
        That list you gave is a subset of the papers available at my local library, satellite branch. Although I'd take issue with putting the LAT and NYT in a national category, while a paper like the Seattle Times might be considered local.
        The ST carries the same sort of stories that the NYT does. And in some catagories (aviation news) has better informed reporting staff. I read the NYT for the crossword puzzles, the arts section, and the comics (their front page). Just last week, they covered some technology is
  - Re: (Score:2)
    
    by anoncoward69 ( 6496862 ) writes:
    
    I would imagine they still do have them so long as there is still a paper issue being delivered. Not just public libraries, but I even remember grade school libraries having a copy of the current day's news paper. They loaded it up in this wooden thing daily to keep the pages bound in the proper order since kids would end up fucking that up if they weren't bound.
- Re: (Score:1)
  
  by lucifuge31337 ( 529072 ) writes:
  
  Do you normally have and publically voice opinions about things that you don't know anything about? There needs to be more shaming of this behavior.
Isn't the problem (Score:5, Insightful)

by umopapisdn69 ( 6522384 ) writes: on Saturday March 21, 2026 @07:59PM (#66053916)

that even in the best situation the publishers can't trust that IA can effectively stop the AIs from just scraping the content from there? The newspapers perhaps can block AIs from their own sites. But once the data is past their hands they have nothing but license statements for control.
Mind you I do think there is a fair use case for the AIs. But it's abundantly clear they are perfectly happy to play the "forgiveness is easier than permission" game. As well as "Hey the milk is already spilt, so whatcha gonna do about it?"

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by MDMurphy ( 208495 ) writes:
  
  Extreme measures to prevent scraping are a waste of time if you also produce a print version. Remember Google scanned 40 million books? At the most, it would make it necessary to pay the subscription price for the paper and scan it in.
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  They can't stop AI crawlers anyway. Other than the IA, the bot makers have a lot of time and money to make them work against the latest protections. You won't stop spam, you won't stop piracy, and you won't stop annoying crawlers. But you can make sending e-mail hard, annoy people trying to use your video service on Linux and kill off archives, because these applications do not have that much money/motivation behind it as spammers, video "pirates" and companies selling data.
- Re:It Will Erase The Historical Record (Score:5, Insightful)
  
  by MDMurphy ( 208495 ) writes: on Saturday March 21, 2026 @10:27PM (#66054006)
  
  How many times have we seen where someone has captured an original website or news story that shows how "history" was later changed? Erasing the historical record is real. So is changing the historical record and trying to claim it hadn't been changed.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: (Score:3)
    
    by geekmux ( 1040042 ) writes:
    
    How many times have we seen where someone has captured an original website or news story that shows how "history" was later changed? Erasing the historical record is real. So is changing the historical record and trying to claim it hadn't been changed.
    You've just wholly justified printed news media. Because anyone and everyone can change digital history.
    AI is training so when it changes the historical record, humans will never notice. By the time most believe this to be true, it will already be too late to stop.
    Printed news media will be illegal at some point. Digital only for that type of information. For every reason most deny.
    - Re: (Score:2)
      
      by allo ( 1728082 ) writes:
      
      AI is an interesting thing in that regard. Each model is a time capsule. Go and download a 2025s model, then download a 2027s next year and compare the sentient. Do the same in the 2030s, 2040s, ... and see how the 2025 spirit is preserved in your old file. Not just as one document, but as an overall LLM capturing the nerve of the internet right now and not just a few archived documents.
    - Re: (Score:2)
      
      by drinkypoo ( 153816 ) writes:
      
      You've just wholly justified printed news media. Because anyone and everyone can change digital history.
      You've just wholly justified open blockchains. Because paper isn't eternal and can be searched for and confiscated easier than uSD cards.
      - Re: (Score:2)
        
        by geekmux ( 1040042 ) writes:
        
        You've just wholly justified printed news media. Because anyone and everyone can change digital history.
        You've just wholly justified open blockchains. Because paper isn't eternal and can be searched for and confiscated easier than uSD cards.
        If we thought a blockchain would actually prove helpful, a lot of former shitcoin owners would have their coins recovered from failed exchanges that often corruptly failed by design.
        Only thing more predictable than news manipulation, is human corruption. Blockchain requires a global network powering its stability and resiliancy. A piece of paper, requires storage. As papers hundreds of years old can attest. Paper has proven a lot more eternal than shitcoin.
  - Re: (Score:1)
    
    by Shadow of Eternity ( 795165 ) writes:
    
    Exactly. This has absolutely nothing to do with AI and everything to do with IA. The New York Times, Guardian, BBC, and their ilk are the most prolific retconners and memory holers out there. They're tired of people using archives to show patterns of disinformation so consistent that it can't be anything but malice.
    - Re: (Score:2)
      
      by sarren1901 ( 5415506 ) writes:
      
      The same type of people are doing this in digital books as well. This is recent news.
      "Pretty Little Lies" was retconned to mention Tiktok, despite being originally published in 2006, before tiktok was a thing at all.
      https://www.yahoo.com/entertai... [yahoo.com]
      This is different then just publishing a new revision that mentions tiktok. This is literally going back and changing original works. It results in a loss of that era's sensibilities. It's exactly the kind of shit you would expect out of someone that wants to era
You mean they just block any scraping? (Score:2)

by Zorpheus ( 857617 ) writes:

Scraping by the Internet Archive just happens in the same way as scraping by AI. Their automatic software probably just blocks any scraping.
fix the problem... (Score:2)

by hAckz0r ( 989977 ) writes:

Step 1: Calculate how much revenue will be lost from people not vising the Times due to AI telling them the answer instead of their reading the article itself
Step 2: Have the AI companies pay the cost of that missing revenue stream.
Now the Times has a guaranteed revenue stream and can open up web access for the AI companies and Historical archivists to do their job.
- Re: (Score:2)
  
  by sarren1901 ( 5415506 ) writes:
  
  I'm going to go out on a limb here and say that it's likely against the rules for IA to just buy a subscription from NYC (what is that, $5 a month or something) and then scrape it. Though I don't know why. I'm guessing copyright.
  Going all digital is definitely going to destroy the historical record in the long run. Then "something" will happen, and a lot of stuff will be lost. Akin to the burning of the Library of Alexandria. It's very sad.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

EFF Tells Publishers: Blocking the Internet Archive Won't Stop AI, But It Will Erase The Historical Record (eff.org) 27

EFF Tells Publishers: Blocking the Internet Archive Won't Stop AI, But It Will Erase The Historical Record More | Reply Login

EFF Tells Publishers: Blocking the Internet Archive Won't Stop AI, But It Will Erase The Historical Record

Could this all be solved (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:Could this all be solved (Score:5, Informative)

Re: (Score:2, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Isn't the problem (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:It Will Erase The Historical Record (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

You mean they just block any scraping? (Score:2)

fix the problem... (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot