Forgot your password?
typodupeerror
AI

Anthropic Accuses Chinese Companies of Siphoning Data From Claude (msn.com) 53

U.S. artificial-intelligence startup Anthropic said three Chinese AI companies set up more than 24,000 fraudulent accounts with its Claude AI model to help their own systems catch up. From a report: The three companies -- DeepSeek, Moonshot AI and MiniMax -- prompted Claude more than 16 million times, siphoning information from Anthropic's system to train and improve their own products, Anthropic said in a blog post Monday.

Earlier this month, an Anthropic rival, OpenAI, sent a memo to House lawmakers accusing DeepSeek of using the same tactic, called distillation, to mimic OpenAI's products. Anthropic said distillation had legitimate uses -- companies use it to build smaller versions of their own products, for example -- but it could also be used to build competitive products "in a fraction of the time, and at a fraction of the cost." The scale of the different companies' distillation activity varied. DeepSeek engaged in 150,000 interactions with Claude, whereas Moonshot and MiniMax had more than 3.4 million and 13 million, respectively, Anthropic said.

This discussion has been archived. No new comments can be posted.

Anthropic Accuses Chinese Companies of Siphoning Data From Claude

Comments Filter:
  • by Anonymous Coward

    They are just learning from it like humans.

    • by Anonymous Coward

      Anthropic: "We went to great lengths and expense to steal our training data. They should have to steal their own, not steal from us!"

  • Boo hoo (Score:5, Insightful)

    by Big Hairy Gorilla ( 9839972 ) on Monday February 23, 2026 @02:22PM (#66005868)
    When I steal your brainwaves it's fine, but when big bad China Co steals my brainwaves... welll.. that's bad.
    So sad.
    • Re:Boo hoo (Score:4, Interesting)

      by hey! ( 33014 ) on Monday February 23, 2026 @02:36PM (#66005920) Homepage Journal

      Anthropic famously bought a lot of copyrighted books and scanned them to ingest into its model training corpus. Arguably they aren't violating copyright because what they are doing is *transformative* -- turning words into a statistical map of word associations.

      But what China is doing by inferring the structure of that map doesn't touch on *any* kind of intellectual property of Anthropics. Sure, the map is a trade secret, but they've exposed that trade secret through their public interface. It's not human created so it's not copyrightable. Even if that map were patentable, which it probably isn't, it's not patented.

      The worst you can say is that China is violating the service's terms of service, which may have no legal force there.

      • Buying books to scan for AI is legal in the US (it's settled law).

        I think it was OpenAI that was sued. It was determined it was legal to use purchased books for training, but the ones they didn't purchase were not fair use.

        • by allo ( 1728082 )

          And AI outputs are not copyrighted, so they are also fair game. ToS allow canceling the account if you're suspected to train using these outputs, but that only means you cannot create further outputs, not that you aren't allowed to use what you already got. They should just get over it. Everyone trains on everyone, or why does Claude use GPTisms in its output?

        • by hey! ( 33014 )

          Well, no. It's true you can't buy books for the purposes of scanning them *and then making them available online* (Hachette v. Internet Archive). Scanning them for AI training is not settled law in every Federal District, although in at least one that has been ruled transformative and therefore allowable (Bartz v Anthropic, Northern District of California).

      • Re: (Score:3, Insightful)

        by drinkypoo ( 153816 )

        Anthropic famously bought a lot of copyrighted books and scanned them to ingest into its model training corpus. Arguably they aren't violating copyright because what they are doing is *transformative* -- turning words into a statistical map of word associations.

        If they did not delete the training corpus when they finished with it then they provably are violating copyright because Anthropic famously bought a lot of copyrighted books and destroyed them after scanning them [reddit.com] to ingest into its model training corpus. When they destroyed the originals to which the copyright licenses were attached, they destroyed the proof of license which permitted them to legally own those copies — and every copyrighted portion of the corpus is not only illegal, but there is a sep

        • by allo ( 1728082 )

          Why would you lose the license granted if you destroy the medium? What allows you to read the book is that you bought the license, not the price of the paper. Otherwise ebooks would have to be free.

          • Why would you lose the license granted if you destroy the medium?

            Because when you buy physical media, that's what the license is attached to. If you lose or destroy your the physical media, it doesn't become legal for you to download another copy because you still own a license, because you do not.

            • by allo ( 1728082 )

              Maybe in US law? But generally one obtains a license for the content and (optionally) some physical media. That's also the reason why it was legal to copy your CDs for yourself.

              • Maybe in US law? But generally one obtains a license for the content and (optionally) some physical media.

                When one buys a physical piece of media it comes with a license to use the media in specific ways (as permitted by law and no more basically) and the item is itself the proof of license. If you destroy the item, you've destroyed the proof of license. If you transfer the original, you are obligated to transfer or destroy all copies.

                So what happens if you destroy the physical copy? That is unclear, but you cannot prove that you ever had it. A scan doesn't prove it, as that could be a scan of any copy. A recei

                • Not for profit copying of books, CDs etcetera is a thing in many parts of the world. So in many Western European countries you can borrow a book from a library, family or colleague and make a copy for yourself (including others in your household). You may not make a profit, and the original must be obtained legally.
            • This, unlike your original post, is not wrong.
              To demonstrate why the first post is wrong, using this post as an analogy, if you lose or destroy your physical media, your fair-use copy or format shift does not suddenly become illegal.
            • by AmiMoJo ( 196126 )

              Hmm, that's a stretch. Backups and making your own digital copies of books are a thing. Maybe under US law you are right, I'm not a US copyright lawyer, but I think it is at best an open question elsewhere.

              • Maybe under US law you are right, I'm not a US copyright lawyer, but I think it is at best an open question elsewhere.

                We're talking about Anthropic, an American company based in San Francisco. DGAF about copyright law in Solla Sollew.

              • Making a backup is definitely a thing in the US.
                Drinky is kind of right, but also kind of really reading too much into it, to the point of being abjectly wrong.

                All jurisdictions I know of have some concept of a First Sale Doctrine, wherein your "license" to use a copyrighted work comes with, and follows the physical item you purcahsed.
                However, in no jurisdiction I am aware of, are your rights curtailed because the item was destroyed. They're only lost if you sell it to someone else. This is for obvious
        • This is completely wrong.
          Destructive scanning of physical media is settled law. You do not need to prove you have a license for something, someone else has to prove (to a preponderance of the evidence) that you do not.

          This post should be moderated into misinformation oblivion.
      • by haruchai ( 17472 )

        "Anthropic famously bought a lot of copyrighted books"
        that's quite an odd way to say they were sued for downloading copyrighted books from pirate sites and settled for $1.5B USD in Sep 2025

        • Except that's not what Anthropic was accused of. They may have done what you claim, but the accusation was that they paid for private high-speed access to already downloaded books which is not illegal. I realize it's hard to see nuance through all the pointless AI/rich big-bad companies hate, but details matter.

          • by haruchai ( 17472 )

            "...but the accusation was that they paid for private high-speed access to already downloaded books..."

            1) go read the complaint:
            https://storage.courtlistener.... [courtlistener.com]
            2) point out the text that supports your assertion that was Anthropic's only violation
            because as someone said "details matter"

            • "downloaded known pirated copies of books from the internet, made unlicensed copies of them,"

              Except they didn't do that. Anthropic didn't make copies. I concede that this complaint alleges that they did, but there was no evidence produced showing that happened.

              • by haruchai ( 17472 )

                so they paid $1.5B to settle for something they didn't do?
                and that you asserted is *not* illegal?

        • that's quite an odd way to say they were sued for downloading copyrighted books from pirate sites and settled for $1.5B USD in Sep 2025

          No, it's not.
          Would you like to try again?

          If party A does thing B, and thing C, then the statement:
          Party A did thing B is correct.
          It follows that the statement:
          Thing B is a funny way of saying Thing C is not.

      • I donâ(TM)t understand people who respond to points about ethics by chirping about legal principles. Like.. is the law really your model for proper behavior? Itâ(TM)s all good unless you do something so evil that your society bands together to punish you?

        • I don't understand people who think matters of Copyright have anything to do with ethics.
          Copyright is an artificial monopoly granted by law.
          • I don't understand people who think matters of Copyright have anything to do with ethics.

            Copyright is an artificial monopoly granted by law.

            Because laws must reflect whatever society collectively considers ethical. The monopoly is granted for an ethical reason (at least in theory), so corporations must comply and accept that, even if they like to pretend it was created only so they can extract profit and ethics was not concerned.

            • Because laws must reflect whatever society collectively considers ethical.

              No.

              The monopoly is granted for an ethical reason (at least in theory),

              No.

              so corporations must comply and accept that, even if they like to pretend it was created only so they can extract profit and ethics was not concerned.

              This doesn't even follow. Are you fucking drunk?

      • by quenda ( 644621 )

        turning words into a statistical map of word associations.

        That's what you think AI is? You are very right to feel threatened by AI, but downplaying it like that comes across as immature. Like saying the invading army can't shoot straight, it won't help you.
        Words are mapped to semantic space in the embedding and de-embedding, but thats a very small part of the process.

        Spot on with the IP analysis. Copyright, patents and trade secrets do not apply here. So the question raised is if new forms of IP law are needed or desirable in the age of AI. Right now,

  • by alvinrod ( 889928 ) on Monday February 23, 2026 @02:26PM (#66005886)
    They're clearly missing a golden opportunity to feed the other AIs a load of complete shit and make them even worse off. The idea of corrupting the Chinese LLMs to be anti-CCP agents is certainly amusing. Train your AI to detect and corrupt other AIs. I don't know if it proves their intelligence at all, but no one can dispute that AIs will definitely be more human-like when they start forming cults.
    • by Sloppy ( 14984 )

      Heh, they say "hallucination," you say "trap street."

    • These Chinese, who allegedly "syphon" whatever magic is there on the andropic servers, are generating a whole bunch of sales that the said andropic is using as a sign of a "booming business" and an excuse for an IPO.

      If it becomes an intentionally supplied "shit" and the said Chinese stop paying for it or, worse, sue for contract violation, what will become of this Claude who believes that God believes in him?

      Zero revenue just ahead of an IPO? Then the IPO goes so far away, that it will be farther than even

  • by SlashbotAgent ( 6477336 ) on Monday February 23, 2026 @02:26PM (#66005896)

    These AI companies have some real gall, complaining about the Chinese appropriating other people's work. Is that not what the AI companies continue to do even now?

    • by dfghjk ( 711126 )

      Exactly. AI companies argue that they can train on information scraped other people's sources but other people can't train on information scraped from their sources. Did Claude respond to all those prompts? If so, what's the problem?

      • This shouldn't be taken as a defense of model producers, either SOTA or people trying to distill from them.

        That being said, there is a difference between scraping up information for the purpose of learning, and using a model's output to clone its behavior.
  • by Anonymous Coward

    Anthropic said distillation had legitimate uses -- companies use it to build smaller versions of their own products, for example -- but it could also be used to build competitive products "in a fraction of the time, and at a fraction of the cost."

    Oh, I see. It's a cost-effective way to get training data without a lot of hassles. Sort of like reading books.

    Why does Anthropic have a problem with this? Haven't they advocated in favor of it, in the past?

    • by nightflameauto ( 6607976 ) on Monday February 23, 2026 @02:39PM (#66005938)

      Anthropic said distillation had legitimate uses -- companies use it to build smaller versions of their own products, for example -- but it could also be used to build competitive products "in a fraction of the time, and at a fraction of the cost."

      Oh, I see. It's a cost-effective way to get training data without a lot of hassles. Sort of like reading books.

      Why does Anthropic have a problem with this? Haven't they advocated in favor of it, in the past?

      We've entered a phase of society where "rules for thee and not for me" is so intrinsic that they don't even notice their own hypocrisy. "GIMME ALL YOUR DATA" and "DON'T STEAL MY DATA" don't even register to them as connected concepts, at all. They have a right to take any data they want and are able to access. They also, once they've acquired that data, 100% believe that the data belongs to them, and always did.

      Our current generation of AI is just greed given digital form, and the very particular greed that drives our owner class. "GIMME THAT, IT'S MINE!" is the name of their number one driver. No other point even exists in their view.

      • by jmke ( 776334 )
        > We've entered a phase of society where "rules for thee and not for me"

        We've never ever entered a phase of society where "rules for thee and me" existed... JFYI
  • by fuzzyfuzzyfungus ( 1223518 ) on Monday February 23, 2026 @02:36PM (#66005922) Journal
    I'm awaiting clarification on why all their arguments about why scraping is their god-given right don't apply when they are getting scraped.
  • Come on. How can it be wrong for someone to scrap data from a place that stole its data?
  • If you ask Claude who it is in Chinese, it refers to itself as DeepSeek: https://reddit.com/r/DeepSeek/... [reddit.com]

    Of course, everyone is "distilling" everyone. Using quotes because distilling implies no other datasets were used, while the datasets created by querying other AIs only comprise a (probably comparably small) part of the total training data.

  • by Arrogant-Bastard ( 141720 ) on Monday February 23, 2026 @04:29PM (#66006178)
    "You're trying to kidnap what I've rightfully stolen!"
  • How dare you - you can't steal my stolen goods!
  • I mean... It's in his NAME for heaven's sake "Alt-Man" Alternative man?
  • The big data sucks, Chinese companies are siphoning from other data models, including Claude, and those models were trained by siphoning data that was scraped from other sources. The big suck...of data. Rather like hacking hackers who hacked your system. I scream, you scream, we all scream for that big data suck stream. The new golden age continues...

    --JoshK.

  • by SAU! ( 228983 ) on Monday February 23, 2026 @06:57PM (#66006506)

    Siphoning? Distillation? Is AI a liquid?

  • I expected it to take 6 more months. But, if they're playing the victim already, either it's time for them to cash in on patents or to sell off their shares while they can.
  • I am laughing soooo hard, that the thieves who stole other peoples' work are complaining about someone stealing theirs.

"The Mets were great in 'sixty eight, The Cards were fine in 'sixty nine, But the Cubs will be heavenly in nineteen and seventy." -- Ernie Banks

Working...