Cloudflare Turns AI Against Itself With Endless Maze of Irrelevant Facts (arstechnica.com) 65

Posted by msmash on Friday March 21, 2025 @11:30PM from the welcome-to-the-maze dept.

Web infrastructure provider Cloudflare unveiled "AI Labyrinth" this week, a feature designed to thwart unauthorized AI data scraping by feeding bots realistic but irrelevant content instead of blocking them outright. The system lures crawlers into a "maze" of AI-generated pages containing neutral scientific information, deliberately wasting computing resources of those attempting to collect training data for language models without permission.

"When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them," Cloudflare explained. The company reports AI crawlers generate over 50 billion requests to their network daily, comprising nearly 1% of all web traffic they process. The feature is available to all Cloudflare customers, including those on free plans. This approach marks a shift from traditional protection methods, as Cloudflare claims blocking bots sometimes alerts operators they've been detected. The false links contain meta directives to prevent search engine indexing while remaining attractive to data-scraping bots.

Cloudflare Turns AI Against Itself With Endless Maze of Irrelevant Facts

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 65 Comments Log In/Create an Account

Comments Filter:

Source code (Score:2)

by buchner.johannes ( 1139593 ) writes:

I hope there is something like this for code soon https://slashdot.org/story/25/... [slashdot.org]
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  As the famous philosopher said, "Humans. You know, we give ourselves a bad rep, but we're genuinely empathetic as a species. I mean, we don't actually really want to kill each other. Which is a good thing... until" --Schopenhauer [wikiquote.org]
This won't do what they claim (Score:4, Informative)

by SumDog ( 466607 ) writes: on Saturday March 22, 2025 @12:18AM (#65251455) Homepage Journal

There are already open source solutions for this:

https://zadzmo.org/code/nepenthes/ [zadzmo.org]

But what I'm worried about is, what is an "unauthorized crawl?" Is this going to star restricting crawling for all CloudFuck protected websites so that we have fewer choices for new search engine startups? Google, DDG and bing are hot garbage. They can say this is about AI, but it's also about polluting all new crawlers for the big bois.

- Re:This won't do what they claim (Score:4, Interesting)
  
  by martin-boundary ( 547041 ) writes: on Saturday March 22, 2025 @12:42AM (#65251473)
  
  Search engines are a solved problem. The problem today is that the dominant search engines have moved *away* from the best approach, in favour of advertising, product placement, and general AI fuckery with the aim to monetize unsuspecting users.
  
  - Re: This won't do what they claim (Score:2)
    
    by reanjr ( 588767 ) writes:
    
    If that were true, a Google killer would be pretty easy to develop.
    - Re: This won't do what they claim (Score:4, Insightful)
      
      by martin-boundary ( 547041 ) writes: on Saturday March 22, 2025 @05:33AM (#65251703)
      
      Google's business is not search, and has not been search for about 20 years now. They started with search, and got funding, then diversified until they found their niche, which is advertising auctions. Search doesn't pay the bills.
      
      - Re: This won't do what they claim (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        Search absolutely pays a big portion of Alphabet's bills.
        "2023 Total Google Search & Other Revenue: $175.04 billion"
        
        Re: This won't do what they claim (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        Sorry, hit the "Submit" button before finishing up with $32B in search alone. A huge portion of their revenue comes from search.
    - Re: (Score:2)
      
      by RazorSharp ( 1418697 ) writes:
      
      The challenge of a Google-killer isn't necessarily a software problem, it's a data and infrastructure problem. They have 25 years of data that no one else has. That doesn't just include the crawling data, it's the data from search inputs and link clicks.
      Google also has other sources of data no one else has. Google Analytics, which every commercial website feels compelled to use. Chrome, which has been the dominant browser for well over a decade. Gmail, which is either the largest or second largest email pro
  - Re: This won't do what they claim (Score:3)
    
    by Big Hairy Gorilla ( 9839972 ) writes:
    
    I get the idea that the internet is now floating in generated sewage now. Do we even want to index and go to that?
    
    Does it really make sense to index everything in a dead internet scenario?
    - Re: (Score:2)
      
      by pipatron ( 966506 ) writes:
      
      That's actually an interesting question.
      As many of us, I remember a time before there were any useful search engines.
      We largely survived on our personal collections of links, "web rings", and other topical link collections.
      - Re: (Score:2)
        
        by Big Hairy Gorilla ( 9839972 ) writes:
        
        I'm no fan of AI, one of the main reasons being that it's arguably a surveillance vector. I argue it is.
        
        But... I don't use the internet for entertainment very much. I have no social media accounts. I do not doom scroll. I do no internet dating or gambling or watching youtube for hours. All my communication is over a private network, with open source apps, connecting to a small circle of friends and family. I do read some news sites. I mostly search for technical material, and answers to technical questions.
- Re:This won't do what they claim (Score:5, Interesting)
  
  by mysidia ( 191772 ) writes: on Saturday March 22, 2025 @01:32AM (#65251515)
  
  Is this going to star restricting crawling for all CloudFuck protected websites so that we have fewer choices for new search engine startups
  Search engines should be fine so long as you obey the Robots.txt and Noindex/Noindex follow meta tags.
  If you are not, then that search engine can be considered Evil and part of the problem.
  On the other hand this can be problematic for individuals scraping a site for legitimate personal reasons
  Or tools designed to scan dark web sites to look for and inform users of evil on the part of websites themself (For example: Tools that help you find out if one of your passwords or your personal information was compromised by scraping certain forums known to be places where criminals leak that kind of info).
  .
  
  - Re: This won't do what they claim (Score:2)
    
    by vbdasc ( 146051 ) writes:
    
    Or the Wayback Machine. The good thing is though, that I have a hard time imagining a website worthy of preservation that uses Cloudfuck.
    - Re: (Score:2)
      
      by mysidia ( 191772 ) writes:
      
      Twitter and Slashdot use them. So do thousands of other major websites.
      Others use alternate providers such as Fastly, but those other providers are also adding more and more anti-scraping features.
      It's not just CF. Another important thing is CF's decision to add this also set an example for other providers and individual sites to follow.
    - Re: This won't do what they claim (Score:2)
      
      by St.Creed ( 853824 ) writes:
      
      Tons and tons of websites use Cloud flare, including my own. It's free, its performance is great, and the GUI is excellent too - for my use case.
      I don't get why people hate it so much.
- Re: (Score:3)
  
  by Vlad_the_Inhaler ( 32958 ) writes:
  
  Google, DDG and bing are hot garbage
  I use DDG as my search engine, and afaik DDG is Bing without the tracking - they pass queries on to Bing after having stripped the identifying information.
  More relevantly, yesterday I did a search on something obscure and one of the more interesting results returned had an absolutely ludicrous URL, one which had virtually no chance of being genuine.
  Curious, I clicked on it (in a private window of course). Of course it was fake. Since I can't remember exactly what I was
- Re: (Score:2)
  
  by angel'o'sphere ( 80593 ) writes:
  
  A web site usually has a "robots.txt" file.
  In the root folder.
  That file contains hints, about how deep a crawler is "allowed" to crawl into the site.
  A search engine is supposed to honour that.
  And an AI scraper is supposed to do the same.
  On top of that: most crawlers voluntarily identify them selves in the browser identification string as: xyz.bot. Where xyz might be google, yahoo, you name it.
  On top of that, you are supposed to have a reasonable delay between web requests: in minutes. The site is not runnin
  - Re: (Score:2)
    
    by XXongo ( 3986865 ) writes:
    
    A web site usually has a "robots.txt" file. In the root folder. That file contains hints, about how deep a crawler is "allowed" to crawl into the site. A search engine is supposed to honour that. And an AI scraper is supposed to do the same.
    Yes, "supposed to".
- Re: (Score:3)
  
  by allo ( 1728082 ) writes:
  
  Search engine startups have the same problems as smaller browsers. Cloudflare does not give a fuck about them and ruins their web experience.
  - Re: (Score:2)
    
    by RazorSharp ( 1418697 ) writes:
    
    1) Show me a search engine startup that doesn't rely on querying Google and/or Microsoft APIs.
    2) If a crawler respects the robots.txt rules then CloudFlare won't fuck them.
    The AI crawlers that CloudFlare blocks lie about their user agent, suck up tons of bandwidth at the website owner's expense (especially since they get caught in weird loops where they crawl the site over and over again), and attempt to crawl admin pages and other non-public pages. AI web crawlers—just like their cousins, SEO optimiz
- Re: (Score:2)
  
  by tlhIngan ( 30335 ) writes:
  
  But what I'm worried about is, what is an "unauthorized crawl?" Is this going to star restricting crawling for all CloudFuck protected websites so that we have fewer choices for new search engine startups? Google, DDG and bing are hot garbage. They can say this is about AI, but it's also about polluting all new crawlers for the big bois.
  Well, it's one that ignores robots.txt, and you can easily see the behavior - you exclude the AI slop via robots.txt, and sure, people may stumble into it, but most thinking
Too lenient, feed them trash (Score:2)

by khchung ( 462899 ) writes:

Cloudflare is too lenient. They should provide an option for customers to feed unauthorised AI crawlers false data to poison them.
- Re: (Score:3)
  
  by mysidia ( 191772 ) writes:
  
  I'd say that is a smart idea, but it's probably best done by the website itself.
  For example if Slashdot detects a client is an AI scraper, then feed it a series of randomized articles and comments, and some of them would be full of nonsensical language, incorrect grammar, random spelling errors, and absurd statements that would tend to corrupt AI models trained on it with false knowledge.
  - Re:Too lenient, feed them trash (Score:5, Funny)
    
    by msauve ( 701917 ) writes: on Saturday March 22, 2025 @07:07AM (#65251785)
    
    >if Slashdot detects a client is an AI scraper, then feed it a series of randomized articles and comments, and some of them would be full of nonsensical language, incorrect grammar, random spelling errors, and absurd statements
    
    How does that differ from normal Slashdot?
    
    - Re: (Score:2)
      
      by mysidia ( 191772 ) writes:
      
      It's very similar, but they get a special merit badge for helping to stave off the AI parasites currently bleeding the internet.
  - Re: Too lenient, feed them trash (Score:2)
    
    by oneiros27 ( 46144 ) writes:
    
    It already will.
    It's AI training being fed AI created stuff, and if you intentionally use a not-very good AI for the generation, it will drag down the AI that's trying to be trained. (Garbage in, garbage out) ... but this isn't really all that new. 25+ years ago when I worked for an ISP, we had problems with crawlers that were looking for email addresses. If we saw one violating our robots.txt, we would redirect it to a CGI that would very slowly (lots of sleep calls) randomly generate bogus email address
- Re: (Score:2)
  
  by thegarbz ( 1787294 ) writes:
  
  In tomorrow's story: Why can't Siri tell me what month it is. Okay well that was yesterday's story, but the point is the same. Everyone thinks this is a good idea right until they receive a product trained on trash and then they complain how it's just not good, enshitification, etc.
  - Re: (Score:2)
    
    by RazorSharp ( 1418697 ) writes:
    
    I fail to see the downside here. If consumers start viewing "AI" as signifying that a product is trash, society wins.
- Re: Too lenient, feed them trash (Score:2)
  
  by reanjr ( 588767 ) writes:
  
  The problem with that approach is that it then becomes a way for the crawlers to know if they've been detected. The counter attack is to waste the crawlers' resources. Keeping them in the dark improves the attack.
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  The web is full of spam. Feeding trash is a drop in the ocean.
- Re: (Score:2)
  
  by stabiesoft ( 733417 ) writes:
  
  Or really ratchet it up a notch, kiddie porn and then notify the feds someone is downloading kiddie porn and give the AI's address.
  - Re: (Score:2)
    
    by RazorSharp ( 1418697 ) writes:
    
    That's a good way to get yourself arrested.
Garbage in, garbage out (Score:2)

by tweissin ( 1705886 ) writes:

ChatGPT, Gemini, etc. will become trash The intent is to discourage crawlers by making it unattractive to scrape from these sites. Meanwhile the LLMs fed by all the bots crawling them will then all be crap. Garbage in, garbage out.
- Re: (Score:1)
  
  by zenlessyank ( 748553 ) writes:
  
  The whole AI idea is garbage, so it is irrelevant what goes in or out. It's just a way for corporations to gain even more control over their employees and to launder money by investing it in AI.
  - Garbage in, Felonies Out? (Score:4, Interesting)
    
    by geekmux ( 1040042 ) writes: on Saturday March 22, 2025 @03:26AM (#65251623)
    
    The whole AI idea is garbage, so it is irrelevant what goes in or out. It's just a way for corporations to gain even more control over their employees and to launder money by investing it in AI.
    Whether of not you or I feel AI is garbage or not, someone better realize the billions being invested in this is coming from those In Control. Of companies. Of multinational corporations. Of investment firms. Of lawmakers and politicians.
    Given those billions, I fully expect those In Control will suggest AI Poisoning will become a felony at best, and an act of domestic terrorism at worst. Very soon.
    
    - Re: Garbage in, Felonies Out? (Score:2)
      
      by reanjr ( 588767 ) writes:
      
      "those In Control will suggest AI Poisoning will become a felony"
      And the first amendment will shut them down.
      - Re: (Score:1)
        
        by Anonymous Coward writes:
        
        What makes you think you'll get to argue your case?
        Trump will just send you to a prison in another country like he's already doing to people.
      - Re: (Score:2)
        
        by geekmux ( 1040042 ) writes:
        
        "those In Control will suggest AI Poisoning will become a felony"
        And the first amendment will shut them down.
        If the First Amendment was deemend null and void in the world of social media, then you better believe it will be treated no differently in the world of AI. Especially when it’s the same damn companies involved.
        The First Amendment starts with Government. In many ways it also stops there.
    - Re: Garbage in, Felonies Out? (Score:2)
      
      by Big Hairy Gorilla ( 9839972 ) writes:
      
      Pretty good take on the motivation for AI. Now we are back who decides what is true or not.
    - - Re: (Score:2)
        
        by geekmux ( 1040042 ) writes:
        
        AI poisoning does not work.
        Externally.
        Internally, AI can be manipulated quite easily when you are THE one in control of said AI.
        Google didn’t just prove you can manipulate the shit out of query results for profit. It built one of the largest companies on the planet on that entire premise. Don’t assume that manipulation will simply vanish with AI.
- Re: Garbage in, garbage out (Score:3)
  
  by vbdasc ( 146051 ) writes:
  
  Now the AI bros are gonna say that this is an act of terrorism against progress that robs the whole humanity of its bright future.
  - Re: (Score:2)
    
    by igreaterthanu ( 1942456 ) writes:
    
    Well it is making everyone's lives worse.
  - Re: (Score:2)
    
    by Fuzi719 ( 1107665 ) writes:
    
    If it is done against Musk's AI, you can be assured the President will make it a capital offense punishable by death. I wish I could put a /s on that, but...
The robots.txt thing seems like a tell. (Score:1)

by fishfrys ( 720495 ) writes:

Easy enough to skip those pages. Cloudflare is obviously so big that ai bots will adapt to its specific strategies. This would work well for a single site, but not for all of their customers. Also, I hope we're close to the point we can admit that the bots can identify squares that contain a traffic light (motorcycle, staircase) as well as me.
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  Cloudflare has a new product to sell. They don't care if it will work, as long as some large websites pay money for the AI bot protection. I won't be surprised if Cloudflare would sell AI companies site data from their CDN without hitting the customers site ...
  - Re: (Score:2)
    
    by RazorSharp ( 1418697 ) writes:
    
    The AI bot protection comes with the free Cloudflare plan.
    AI bots waste bandwidth. . .that includes Cloudflare's bandwidth. I can't see Cloudflare selling AI companies the data from the CDNs because their whole business model relies on website owners viewing them as protection against nefarious actors, AI companies included. They payout from the AI companies wouldn't compensate from loss of customers. Cloudflare has always been pretty smart about knowing who pays their bills.
    - Re: (Score:2)
      
      by allo ( 1728082 ) writes:
      
      You won't see them selling it sneakily. But that doesn't mean that they aren't it a great position as mediator.
      "Earn money with the images you host on our CDN? Set this checkmark so our partners can use them for AI training. The images will not be used for other purposes than AI training. [learn more]"
- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  The easiest way to avoid the problem is to obey the "robots.txt" file. (Or at least to notice when you aren't obeying it.)
Almost sounds like . . . (Score:3)

by quonset ( 4839537 ) writes: on Saturday March 22, 2025 @06:29AM (#65251745)

You are in a maze of twisty little passages, all alike.

ICE (Score:2)

by e3m4n ( 947977 ) writes:

Looks like the beginnings of Intrusion Countermeasure Electronics from the Neuromancer series of books by William Gibson.
- Re: (Score:2)
  
  by RazorSharp ( 1418697 ) writes:
  
  Everything about our current implementation of AI looks like the Neuromancer series.
An infinite amount of ntirely hallucinated content (Score:2)

by Growlley ( 6732614 ) writes:

I hope!
Oh, the irony! (Score:2)

by Big Hairy Gorilla ( 9839972 ) writes:

Reread the summary title. It describes the entirety of the internet today!

The internet is dead.
"AI" Causing Far More Harm to Humanity than Help (Score:3)

by BrendaEM ( 871664 ) writes: on Saturday March 22, 2025 @08:38AM (#65251911) Homepage

We need laws to stop the widespread theft of the world's intellectual property.

Hire me (Score:1)

by Fons_de_spons ( 1311177 ) writes:

I will write training data if you hire me. I'd feed it data to learn it to stubbornly to ask someone's pronouns, and insist that the person it is talking to is not sure about his or her gender but does not realize it. Also teach it about the importance of asking about the car's extended warrenty. It is very rude if you do not do that in a conversation. Also... teach it about the importance of correct spelling and grammar. Insist that the user corrects everything before really replying to the input.
How is that content any different (Score:2)

by mick232 ( 1610795 ) writes:

than much of what can be already found now?
Nothing to see here -same story ran 20yrs ago (Score:1)

by TheWho79 ( 10289219 ) writes:

Spider traps are easy to code. You just "loop see loop" generated into an endless maze of htaccess redirects until the bot gives up. This is same story - different target, that we ran with 20years ago.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Source code (Score:2)

Re: (Score:2)

This won't do what they claim (Score:4, Informative)

Re:This won't do what they claim (Score:4, Interesting)

Re: This won't do what they claim (Score:2)

Re: This won't do what they claim (Score:4, Insightful)

Re: This won't do what they claim (Score:2)

Re: This won't do what they claim (Score:2)

Re: (Score:2)

Re: This won't do what they claim (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:This won't do what they claim (Score:5, Interesting)

Re: This won't do what they claim (Score:2)

Re: (Score:2)

Re: This won't do what they claim (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Too lenient, feed them trash (Score:2)

Re: (Score:3)

Re:Too lenient, feed them trash (Score:5, Funny)

Re: (Score:2)

Re: Too lenient, feed them trash (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Too lenient, feed them trash (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Garbage in, garbage out (Score:2)

Re: (Score:1)

Garbage in, Felonies Out? (Score:4, Interesting)

Re: Garbage in, Felonies Out? (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: Garbage in, Felonies Out? (Score:2)

Re: (Score:2)

Re: Garbage in, garbage out (Score:3)

Re: (Score:2)

Re: (Score:2)

The robots.txt thing seems like a tell. (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Almost sounds like . . . (Score:3)

ICE (Score:2)

Re: (Score:2)

An infinite amount of ntirely hallucinated content (Score:2)

Oh, the irony! (Score:2)

"AI" Causing Far More Harm to Humanity than Help (Score:3)

Hire me (Score:1)

How is that content any different (Score:2)

Nothing to see here -same story ran 20yrs ago (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals