Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
AI The Internet

Perplexity Says Cloudflare's Accusations of 'Stealth' AI Scraping Are Based On Embarrassing Errors (zdnet.com) 27

In a report published Monday, Cloudflare accused Perplexity of deploying undeclared web crawlers that masquerade as regular Chrome browsers to access content from websites that have explicitly blocked its official bots. Since then, Perplexity has publicly and loudly announced that Cloudflare's claims are baseless and technically flawed. "This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats," says Perplexity in a blog post. "If you can't tell a helpful digital assistant from a malicious scraper, then you probably shouldn't be making decisions about what constitutes legitimate web traffic."

Perplexity continues: "Technical errors in Cloudflare's analysis aren't just embarrassing -- they're disqualifying. When you misattribute millions of requests, publish completely inaccurate technical diagrams, and demonstrate a fundamental misunderstanding of how modern AI assistants work, you've forfeited any claim to expertise in this space."

Perplexity Says Cloudflare's Accusations of 'Stealth' AI Scraping Are Based On Embarrassing Errors

Comments Filter:
  • "Make some shit up about how we're not doing that... except to be helpful... which is totally not the purpose of our product in the first place."
  • As fucked up as Cloudflare is I'd believe them over Perplexity any day of the week.

  • What? (Score:5, Informative)

    by viperidaenz ( 2515578 ) on Tuesday August 05, 2025 @09:48PM (#65568916)

    Cloudflare are saying Perplexity is disguising its crawler bots as Chrome users.

    Perplexity counters saying their crawlers don't do that, their other AI tools do it.

    Seems like Cloudflare correctly determined that Perplexity are using automated tools to access website that requested to not be accessed by automated tool, despite Perplexity trying to hide their behaviour.

    • Cloudflare are saying Perplexity is disguising its crawler bots as Chrome users.

      Perplexity counters saying their crawlers don't do that, their other AI tools do it.

      Seems like Cloudflare correctly determined that Perplexity are using automated tools to access website that requested to not be accessed by automated tool, despite Perplexity trying to hide their behaviour.

      This is an interesting technicality. Perplexity is basically saying that robots.txt prohibitions only apply to training, either in terms of building page ranks for search or accumulating data for AI model training. So, if the data is immediately used, then it doesn't count as training, so robots.txt doesn't apply. Is this true?

      Of course, the curious question is how Perplexity knows what's behind the robots.txt wall to serve as immediate on-the-fly answers if it didn't already previously crawl past the wa

  • The content delivery network people with 16 years of experience in this game or the AI liars that peddle a basically fraudulent product based on a massive piracy campaign?

    Hmm. Difficult!

    • by PPH ( 736903 )

      But Perplexity is basically admitting it:

      "If you can't tell a helpful digital assistant from a malicious scraper, then you probably shouldn't be making decisions about what constitutes legitimate web traffic."

      It really doesn't matter if Perplexity thinks they are "a helpful digital assistant". That's not what the robots.txt file says. There's no flag in there to allow only the "helpful" ones to scrape. Just don't scrape, m'kay?

  • by Rendus ( 2430 ) <rendus@gm a i l.com> on Tuesday August 05, 2025 @09:56PM (#65568928)

    Sounds to me like the accusations are true, and Perplexity is deflecting by saying they're harmless and even helpful.

    On my own servers, I see a pattern of behavior of something hitting my robots.txt (which both has a blanket denial for all user agents, AND specific denails for all known bots), and then suddenly a variety of IP addresses start hammering my site. It's bad enough I'm either going to put my servers behind Cloudflare, or at least one of the gatekeeper challenge systems.

    Perpexity's shitty response really does nothing for me but confirm the accusations.

    • by ndsurvivor ( 891239 ) on Tuesday August 05, 2025 @10:01PM (#65568942)
      To play a devils advocate, if a human asks his/her Perplexity agent to buy something off of, or to get information from your website, is that the AI scraping your data, or a person using your website?
      • "scraping" is maybe questionable, but there's no question that an AI is accessing your website.
      • by Rendus ( 2430 )

        In my case, I have this in robots.txt, on a forum I don't want to get absolutely destroyed by bots that don't rate-limit themselves:

        User-agent: *
        Disallow: /

        As for agentic actions, it's still an AI performing the actions. I said "No" to robots accessing my site. That they then pivot and do something that ISN'T not accessing my site is inappropriate no matter the reason.

      • by tepples ( 727027 )

        It depends. Does the Perplexity agent faithfully relay the personalized messages from sponsors that would have otherwise been presented adjacent to the information on the website?

        • It looks like a mess to me. No, I don't believe the advertising is getting to the person if the person is using an AI agent.
      • If a human asks an AI agent to do this for them, why would the AI agent access robots.txt? Did you check Robots.txt before posting on Slashdot?

  • Sure, it's a little different to request a site on behalf of someone rather than downloading content to be used to generate the model. However it's seem pretty legitimate to block the "AI Assistants" as well as lack of eyeballs means lack of ads or memberships.
    • This is starting to split hairs. There are lots of legitimate reasons for pages to be fetched by a "bot". If you post a link to a social media platform, that system will fetch the page to access the HTML meta tags to find things like the page title, an image to represent the page, a description, etc, and that is what is displayed in the post instead of just a plain URL. That request also doesn't result in "eyeballs" and ads are not served. Browsers can pre-fetch URLs on a page, again, not resulting in the

      • by Himmy32 ( 650060 )

        Yes, undoubtedly there are a lot of legitimate reasons for automated requests like the ones you have listed.

        But individual requests for content for AI assistants can be even more problematic than a traditional scrape. Getting scraped once a day isn't that much traffic. But if 2 million people ask an AI assistant want to know what foo actor starred in bar movie that generates a request to movie database, those 2 million eyeballs aren't seeing the ads for upcoming movies and the host has to serve all that ex

      • If Cloudflare is misrepresenting the data to make it appear as if data is being scraped for training purposes when it is not then that is indeed something different.

        You can guarantee that what ever the trigger for Perplexity's bots collecting data, they'll use it for training too.

        If they were navigating through a site like a person would, I doubt it would be triggering Cloudflare's bot logic.

  • Hey wait... (Score:4, Insightful)

    by Tschaine ( 10502969 ) on Tuesday August 05, 2025 @10:15PM (#65568968)

    Cloudflare claimed that perplexity's AI was able to answer questions about content on pages that perplexity was prohibited from accessing.

    How exactly does perplexity explain that phenomenon?

    • by Himmy32 ( 650060 )
      It's not scraped into the model, it's request on behalf of the users loaded individual a whole lot of times for a whole lot of users. Totally different because not only are the users not going to the site, the site also has to be pay for a whole bunch of individual traffic.
      • Don't worry, the data is still going into their future models.

        Sounds like they're using their users to moderate the content the bots scrape. "Use our agent to browse the web for you, and tell us how good it's performing"

    • Perplexing, isn't it? That's Clairvoyic AI, the latest advancement. It learns things without having to read them. Knows without training. It's a whole paradigm shift that you're too feeble to understand.

  • Perplexity's accusatory and belligerent tone is as good as guilty plea in my book. They sound like someone who is trying to insult the other party into submission so that they won't have to come clean.

"Joy is wealth and love is the legal tender of the soul." -- Robert G. Ingersoll

Working...