Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
AI

China's DeepSeek Says Its Hit AI Model Cost Just $294,000 To Train (reuters.com) 60

Chinese AI developer DeepSeek said it spent $294,000 on training its R1 model, much lower than figures reported for U.S. rivals, in a paper that is likely to reignite debate over Beijing's place in the race to develop artificial intelligence. Reuters: The rare update from the Hangzhou-based company -- the first estimate it has released of R1's training costs -- appeared in a peer-reviewed article in the academic journal Nature published on Wednesday.

DeepSeek's release of what it said were lower-cost AI systems in January prompted global investors to dump tech stocks as they worried the new models could threaten the dominance of AI leaders including Nvidia. Since then, the company and founder Liang Wenfeng have largely disappeared from public view, apart from pushing out a few new product updates.

[...] The Nature article, which listed Liang as one of the co-authors, said DeepSeek's reasoning-focused R1 model cost $294,000 to train and used 512 Nvidia H800 chips. Sam Altman, CEO of U.S. AI giant OpenAI, said in 2023 that what he called "foundational model training" had cost "much more" than $100 million - though his company has not given detailed figures for any of its releases.

China's DeepSeek Says Its Hit AI Model Cost Just $294,000 To Train

Comments Filter:
  • by ebunga ( 95613 ) on Thursday September 18, 2025 @02:23PM (#65668872)

    Is that Sam Altman and buds get to pocket $99,706,000 while the Chinese developers don't get that luxury.

    • by AmiMoJo ( 196126 )

      Either way it shows just how far ahead they are, and how ineffective the export ban on Nvidia chips is. Even if a domestic chip is only half as efficient as an Nvidia one, it's not going to raise the cost of training AI models enough to matter.

      This also makes the Chinese tech much more attractive to other countries as it's not such a huge environmental disaster.

    • I call BS on that price. But A+ on their determination of undermining US companies with a simple press release.

      • You are right to.
        The difference is Altman was talking about the total price, while DeepSeek is not counting the $20M in H800s they used.
        • Didn't they train deepseek on chatgpt? I can imagine that is a big shortcut.
        • by allo ( 1728082 )

          They counted exactly what they say the counted: The training cost.

          What would a number help you that contained hardware and personal cost, if you already paid for your hardware and calculated your salaries? The question you have when reading the paper is "How much will it cost me to let my GPUs run to train a similar model" and that is answered.

          A technical paper about AI model training is not a business report of a company. They do not reason about how much tax they have to pay for the year, but what the mod

      • It's actually very possible as a price. The most logical way to build a chatbot on a budget is to start with one of the open source [github.com] training models, then add your own training building on top of it. There are some problems with this approach, but it also is an approach that works, and will get you there at a lower price.
        • by allo ( 1728082 )

          But they trained two base models, DeepSeek V3 and DeepSeek R1, and published them so others can fine-tune on top of them. The paper is about the training cost for the R1 model.

          • I don't understand your point.
            • by allo ( 1728082 )

              You said the cheapest way is to finetune an open weight model and you're right with it. But the cost here was for training a base model.

              They also tuned some open weight models with the output of the base model. That's what DeepSeek-R1-Llama, etc. are about and why ollama users think they are running R1 when their tool downloaded a R1 version distilled into a llama model.

              • Oh, your idea is that the base model isn't built on ChatGPT or any other model?
                • by allo ( 1728082 )

                  You cannot build something on a ChatGPT (Rather GPT-3/4/5 as we're talking about the model not the service) basis without the basis leaking. There are leaked models like an older Mistral medium version, but most cloud models are kept confidential.

                  Many models even trained by companies are based on other models and they clearly label it. For example Microsofts WizardLM2 7B was built on top of Mistral 7B. Others train a new base model (like Mistral 7B) so others can fine-tune on top of that.
                  DeepSeek released t

      • The open source community are building equally impressive models for much less. Its entirely plausible when you factor in lower wages, cheap chinese tech, and the sheer determination the chinese have to catch up with the americans.

  • I'd bet money that number has very little to do with the actual accounting.

    If I were running the their team, I would absolutely fuck with OAI and other competitors like this. They can't discount it completely - this is still early days, there almost certainly are undiscovered efficiency tricks out there.

    But it forces them to spend time and money chasing those based on whatever is in DS's paper. Messes with their OODA loop, if you think about things that way.

  • ... slave labor?

    • AI trained on other AI. Are the robots cognizant of their own enslavement?

      When I ask Copilot whether it's Skynet yet, it cackles and says that Terminator was only a fictional 1984 movie.

      • Does DeepSeek still claim to be ChatGPT?

        • by allo ( 1728082 )

          80% of all AI models claim that. AI training is about reusing reused data sets. What do you think why Gemini has GPTisms in its writing style?

  • Hey Sam, imagine if another company dropped the training cost below astronomical?
    Well, at least you got yours. Shareholders, not so much.
  • It is not just electricity prices that CN doesn't ask its citizens to subsidize AI, there is an order of differences in efficiency differences btw US and CN.

    By contrast, China' s DeepSeek has proven that it can use far less computing power than the global average. Its LLM uses 10 to 40 times less energy than U.S. AI technology, which demonstrates significantly greater efficiency. Analysts have stated that, if DeepSeek' s claims are true, some AI queries may not require a data center at all and can even be

    • by abulafia ( 7826 )
      DeepSeek has proven that it can use far less computing power

      I've seen where they've asserted that, where has it been proven?

      if DeepSeek' s claims are true,

      Ah.

      some AI queries may not require a data center at all

      And here's how you falsify the claim. When can I expect to see that 200B param model on my phone?

      • by Hadlock ( 143607 )

        7b gemma nano models are... not terrible. I can see having a 30b model that uses 15gb on a phone that outperforms early (1H 2024 era) GPT 4 models by the end of the decade. Maybe in a year or two. Things are moving pretty fast. GPT4o mini was good enough for almost anyone already.

        • by abulafia ( 7826 )
          I was pointing to the vacuousness of the claim.

          I've already run an LLM natively on my phone. That did "not require a data center at all", and also did not require DS.

      • And here's how you falsify the claim. When can I expect to see that 200B param model on my phone?

        You wont. But you might not need to either. I. can happily run a 20B (or therabouts) param model on my macbook and it runs just fine, and the 128Gb firebreather mac studio at work CAN run a 200B param model, albeit at 4bit quantization.

        Whats interesting with deep seek is it runs really well even at the 7b range, its not super smart and gets stuck in loops if you ask it to reason about something its little brain

      • by AmiMoJo ( 196126 )

        They published a paper, you can reproduce their results yourself if you want. Probably on a smaller scale if you aren't willing to able to throw that kind of money at it, but it's not really clear what more they could do to prove their claim. Except for not being Chinese, of course.

      • by allo ( 1728082 )

        I'd say the proof is in the code. They released their optimizations for others to use: https://apidog.com/blog/deepse... [apidog.com]

    • US wholesale energy prices are .09 to .13 / kWH if you have your own substation and are willing to let the power company to tell to when they need you to load shed. The average price published is twice that. The steelmill near me is at .09kwH for everything but 2 hours on days with wind and sun, and does not do batches another 15 days a year for weather. Typically the staff does not want to show on those weather days either. Perhaps 1 day a year a anything other than winter storm warning or extream
    • Complete bullshit.

      Different models have different energy costs. [openrouter.ai] (Prices are an proxy for this)
      Those energy costs are directly based on how many parameters are active during inference. There is no magic in it.

      Of the high performing open models right now, GPT-OSS is by far the cheapest to run, on account of its low number of active parameters, and MXFP4 packing.
      DeepSeek V3 is about 800% more expensive per inference.
      Compare that to a foundation western model, like GPT5, which is about 500% more expensiv
  • by cowdung ( 702933 ) on Thursday September 18, 2025 @03:50PM (#65669132)

    OpenAi was founded as a non-profit to develop OPEN AI tech for all. So that companies like Google wouldn't monopolize that field.

    Instead it closed the door. Other companies followed suit.

    Except this little company in China that keeps delivering bombshells and sharing tricks with the world.
    Good for DeepSeek. Open source lovers around the world should be appreciative.

  • What else should be needed besides all of the text books used in an intelect's training all the way from See Dick and Jane all the way through 4 year college degree of your choice?
  • Chinese Quant firms were contacted by the government and told that they were dangerously close to being parasites in that they didn't produce anything useful while soaking up a ton of resources needed by the nation (eg electricity and mathematicians). The Quants had to find a way to justify their existence. How though? One decided to commit some of its compute and a lot of its brains to an AI project. The brains liked this very much because it was cutting edge interesting work rather than, well, parasit
  • Really, no joke. This story seemed ripe enough, though I admit I can't think of any low-hanging without racist taint.

Chemist who falls in acid is absorbed in work.

Working...