

Perplexity Says Cloudflare's Accusations of 'Stealth' AI Scraping Are Based On Embarrassing Errors (zdnet.com) 27
In a report published Monday, Cloudflare accused Perplexity of deploying undeclared web crawlers that masquerade as regular Chrome browsers to access content from websites that have explicitly blocked its official bots. Since then, Perplexity has publicly and loudly announced that Cloudflare's claims are baseless and technically flawed. "This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats," says Perplexity in a blog post. "If you can't tell a helpful digital assistant from a malicious scraper, then you probably shouldn't be making decisions about what constitutes legitimate web traffic."
Perplexity continues: "Technical errors in Cloudflare's analysis aren't just embarrassing -- they're disqualifying. When you misattribute millions of requests, publish completely inaccurate technical diagrams, and demonstrate a fundamental misunderstanding of how modern AI assistants work, you've forfeited any claim to expertise in this space."
Perplexity continues: "Technical errors in Cloudflare's analysis aren't just embarrassing -- they're disqualifying. When you misattribute millions of requests, publish completely inaccurate technical diagrams, and demonstrate a fundamental misunderstanding of how modern AI assistants work, you've forfeited any claim to expertise in this space."
Denial probably AI-written. (Score:2)
Re: (Score:2)
"Technical errors in Cloudflare's analysis aren't just embarrassing -- they're disqualifying."
That is 100% the sentence structure of an AI dress-down.
Re:Denial probably AI-written. (Score:4, Funny)
"Make it sound stern, technical, and authoritative."
Re: (Score:2)
I mean, someone probably said "It would be weird if we DIDN'T have our AI write this, right? Right? Anyone? Bueller?"
Sure that's going to work. (Score:1)
As fucked up as Cloudflare is I'd believe them over Perplexity any day of the week.
What? (Score:5, Informative)
Cloudflare are saying Perplexity is disguising its crawler bots as Chrome users.
Perplexity counters saying their crawlers don't do that, their other AI tools do it.
Seems like Cloudflare correctly determined that Perplexity are using automated tools to access website that requested to not be accessed by automated tool, despite Perplexity trying to hide their behaviour.
Re: (Score:2)
Cloudflare are saying Perplexity is disguising its crawler bots as Chrome users.
Perplexity counters saying their crawlers don't do that, their other AI tools do it.
Seems like Cloudflare correctly determined that Perplexity are using automated tools to access website that requested to not be accessed by automated tool, despite Perplexity trying to hide their behaviour.
This is an interesting technicality. Perplexity is basically saying that robots.txt prohibitions only apply to training, either in terms of building page ranks for search or accumulating data for AI model training. So, if the data is immediately used, then it doesn't count as training, so robots.txt doesn't apply. Is this true?
Of course, the curious question is how Perplexity knows what's behind the robots.txt wall to serve as immediate on-the-fly answers if it didn't already previously crawl past the wa
Soo, who to trust? (Score:2)
The content delivery network people with 16 years of experience in this game or the AI liars that peddle a basically fraudulent product based on a massive piracy campaign?
Hmm. Difficult!
Re: (Score:2)
But Perplexity is basically admitting it:
"If you can't tell a helpful digital assistant from a malicious scraper, then you probably shouldn't be making decisions about what constitutes legitimate web traffic."
It really doesn't matter if Perplexity thinks they are "a helpful digital assistant". That's not what the robots.txt file says. There's no flag in there to allow only the "helpful" ones to scrape. Just don't scrape, m'kay?
Sounds like the accusations are true. (Score:4, Insightful)
Sounds to me like the accusations are true, and Perplexity is deflecting by saying they're harmless and even helpful.
On my own servers, I see a pattern of behavior of something hitting my robots.txt (which both has a blanket denial for all user agents, AND specific denails for all known bots), and then suddenly a variety of IP addresses start hammering my site. It's bad enough I'm either going to put my servers behind Cloudflare, or at least one of the gatekeeper challenge systems.
Perpexity's shitty response really does nothing for me but confirm the accusations.
Re:Sounds like the accusations are true. (Score:4, Interesting)
Re: (Score:2)
Re: (Score:2)
Cloudflare will block if you ask them to block. If your desire is to let AI do what they want on your site, Cloudflare won't get in the way.
Many to most websites are business supported by ad revenue in addition to ecommerce - Letting AI become ad block for your end users is not ideal. Letting AI go haywire and purchase random shit instead of what the agent was instructed to do is also not ideal, for that matter.
Each site deserves the right to say "yes", "no", or "on these terms..." to AI crawlers and agents
Re: (Score:2)
In my case, I have this in robots.txt, on a forum I don't want to get absolutely destroyed by bots that don't rate-limit themselves:
User-agent: *
Disallow: /
As for agentic actions, it's still an AI performing the actions. I said "No" to robots accessing my site. That they then pivot and do something that ISN'T not accessing my site is inappropriate no matter the reason.
Re: (Score:2)
It depends. Does the Perplexity agent faithfully relay the personalized messages from sponsors that would have otherwise been presented adjacent to the information on the website?
Re: (Score:2)
Re: (Score:2)
If a human asks an AI agent to do this for them, why would the AI agent access robots.txt? Did you check Robots.txt before posting on Slashdot?
Differences (Score:2)
Re: (Score:2)
This is starting to split hairs. There are lots of legitimate reasons for pages to be fetched by a "bot". If you post a link to a social media platform, that system will fetch the page to access the HTML meta tags to find things like the page title, an image to represent the page, a description, etc, and that is what is displayed in the post instead of just a plain URL. That request also doesn't result in "eyeballs" and ads are not served. Browsers can pre-fetch URLs on a page, again, not resulting in the
Re: (Score:3)
Yes, undoubtedly there are a lot of legitimate reasons for automated requests like the ones you have listed.
But individual requests for content for AI assistants can be even more problematic than a traditional scrape. Getting scraped once a day isn't that much traffic. But if 2 million people ask an AI assistant want to know what foo actor starred in bar movie that generates a request to movie database, those 2 million eyeballs aren't seeing the ads for upcoming movies and the host has to serve all that ex
Re: (Score:2)
If Cloudflare is misrepresenting the data to make it appear as if data is being scraped for training purposes when it is not then that is indeed something different.
You can guarantee that what ever the trigger for Perplexity's bots collecting data, they'll use it for training too.
If they were navigating through a site like a person would, I doubt it would be triggering Cloudflare's bot logic.
Hey wait... (Score:4, Insightful)
Cloudflare claimed that perplexity's AI was able to answer questions about content on pages that perplexity was prohibited from accessing.
How exactly does perplexity explain that phenomenon?
Re: (Score:2)
Re: (Score:2)
Don't worry, the data is still going into their future models.
Sounds like they're using their users to moderate the content the bots scrape. "Use our agent to browse the web for you, and tell us how good it's performing"
Re: (Score:2)
Perplexing, isn't it? That's Clairvoyic AI, the latest advancement. It learns things without having to read them. Knows without training. It's a whole paradigm shift that you're too feeble to understand.
They're guilty (Score:2)
Perplexity's accusatory and belligerent tone is as good as guilty plea in my book. They sound like someone who is trying to insult the other party into submission so that they won't have to come clean.