China's DeepSeek Says Its Hit AI Model Cost Just $294,000 To Train (reuters.com) 60

Posted by msmash on Thursday September 18, 2025 @02:02PM from the landmark-milestones dept.

Chinese AI developer DeepSeek said it spent $294,000 on training its R1 model, much lower than figures reported for U.S. rivals, in a paper that is likely to reignite debate over Beijing's place in the race to develop artificial intelligence. Reuters: The rare update from the Hangzhou-based company -- the first estimate it has released of R1's training costs -- appeared in a peer-reviewed article in the academic journal Nature published on Wednesday.

DeepSeek's release of what it said were lower-cost AI systems in January prompted global investors to dump tech stocks as they worried the new models could threaten the dominance of AI leaders including Nvidia. Since then, the company and founder Liang Wenfeng have largely disappeared from public view, apart from pushing out a few new product updates.

[...] The Nature article, which listed Liang as one of the co-authors, said DeepSeek's reasoning-focused R1 model cost $294,000 to train and used 512 Nvidia H800 chips. Sam Altman, CEO of U.S. AI giant OpenAI, said in 2023 that what he called "foundational model training" had cost "much more" than $100 million - though his company has not given detailed figures for any of its releases.

China's DeepSeek Says Its Hit AI Model Cost Just $294,000 To Train

Post Load All Comments

Search 60 Comments Log In/Create an Account

Comments Filter:

Well you see, the difference here (Score:3)

by ebunga ( 95613 ) writes: on Thursday September 18, 2025 @02:23PM (#65668872)

Is that Sam Altman and buds get to pocket $99,706,000 while the Chinese developers don't get that luxury.

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by AmiMoJo ( 196126 ) writes:
  
  Either way it shows just how far ahead they are, and how ineffective the export ban on Nvidia chips is. Even if a domestic chip is only half as efficient as an Nvidia one, it's not going to raise the cost of training AI models enough to matter.
  This also makes the Chinese tech much more attractive to other countries as it's not such a huge environmental disaster.
- Re: Well you see, the difference here (Score:3)
  
  by sziring ( 2245650 ) writes:
  
  I call BS on that price. But A+ on their determination of undermining US companies with a simple press release.
  - Re: (Score:2)
    
    by DamnOregonian ( 963763 ) writes:
    
    You are right to.
    The difference is Altman was talking about the total price, while DeepSeek is not counting the $20M in H800s they used.
    - Re: Well you see, the difference here (Score:2)
      
      by Fons_de_spons ( 1311177 ) writes:
      
      Didn't they train deepseek on chatgpt? I can imagine that is a big shortcut.
    - Re: (Score:3)
      
      by allo ( 1728082 ) writes:
      
      They counted exactly what they say the counted: The training cost.
      What would a number help you that contained hardware and personal cost, if you already paid for your hardware and calculated your salaries? The question you have when reading the paper is "How much will it cost me to let my GPUs run to train a similar model" and that is answered.
      A technical paper about AI model training is not a business report of a company. They do not reason about how much tax they have to pay for the year, but what the mod
  - Re: (Score:2)
    
    by phantomfive ( 622387 ) writes:
    
    It's actually very possible as a price. The most logical way to build a chatbot on a budget is to start with one of the open source [github.com] training models, then add your own training building on top of it. There are some problems with this approach, but it also is an approach that works, and will get you there at a lower price.
    - Re: (Score:2)
      
      by allo ( 1728082 ) writes:
      
      But they trained two base models, DeepSeek V3 and DeepSeek R1, and published them so others can fine-tune on top of them. The paper is about the training cost for the R1 model.
      - Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        I don't understand your point.
        
        Re: (Score:2)
        
        by allo ( 1728082 ) writes:
        
        You said the cheapest way is to finetune an open weight model and you're right with it. But the cost here was for training a base model.
        They also tuned some open weight models with the output of the base model. That's what DeepSeek-R1-Llama, etc. are about and why ollama users think they are running R1 when their tool downloaded a R1 version distilled into a llama model.
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        Oh, your idea is that the base model isn't built on ChatGPT or any other model?
        
        Re: (Score:2)
        
        by allo ( 1728082 ) writes:
        
        You cannot build something on a ChatGPT (Rather GPT-3/4/5 as we're talking about the model not the service) basis without the basis leaking. There are leaked models like an older Mistral medium version, but most cloud models are kept confidential.
        Many models even trained by companies are based on other models and they clearly label it. For example Microsofts WizardLM2 7B was built on top of Mistral 7B. Others train a new base model (like Mistral 7B) so others can fine-tune on top of that.
        DeepSeek released t
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        https://community.openai.com/t... [openai.com]
        
        Re: (Score:2)
        
        by allo ( 1728082 ) writes:
        
        Almost all models are contamined by ChatGPT. Especially the earlier Community-Models always answered "I am made by OpenAI" even when using a completely different base. Why? Because of datasets. Browse a few public datasets and see how often they contain references to the model being ChatGPT.
        Now there are many public datasets, but the first datasets people created for finetuning were literally giving ChatGPT texts and prompts and ask it to write something similar. Ever wondered why all models talk about Elar
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        Yeah, now you're talking in a way that shows your ignorance.
        
        https://theconversation.com/op... [theconversation.com]
        
        Re: (Score:2)
        
        by allo ( 1728082 ) writes:
        
        Bye.
  - Re: (Score:2)
    
    by sg_oneill ( 159032 ) writes:
    
    The open source community are building equally impressive models for much less. Its entirely plausible when you factor in lower wages, cheap chinese tech, and the sheer determination the chinese have to catch up with the americans.
- Re: (Score:1)
  
  by CallMeTim ( 6454842 ) writes:
  
  What are you referring to here? What happened between Microsoft and Deepseek?
  - Re: (Score:1)
    
    by CallMeTim ( 6454842 ) writes:
    
    Oh, never mind, I get it. Sam Altman is the crypto grifter
    - Re: (Score:2)
      
      by Tablizer ( 95088 ) writes:
      
      I thought that was Orangelini?
      - Re: (Score:2)
        
        by Mr. Dollar Ton ( 5495648 ) writes:
        
        Orangelini is a very high-level grifter, he's now peddling power for money directly.
        Sometimes he may be reading something from a script printed in VERY LARGE LETTERS, but he's not even pretending to care about whatever it is he's getting paid for.
        Unlike the grifters that are lower in the pecking order who have to sell something more than "liberal tears".
Messing with people (Score:2)

by abulafia ( 7826 ) writes:

I'd bet money that number has very little to do with the actual accounting.
If I were running the their team, I would absolutely fuck with OAI and other competitors like this. They can't discount it completely - this is still early days, there almost certainly are undiscovered efficiency tricks out there.
But it forces them to spend time and money chasing those based on whatever is in DS's paper. Messes with their OODA loop, if you think about things that way.
- - Re: Messing with people (Score:1)
    
    by Incadenza ( 560402 ) writes:
    
    The electricity costs will be measured in hours of solar & wind in China, and barrels of oil in the US.
    - Re: (Score:2)
      
      by Linux Torvalds ( 647197 ) writes:
      
      This would be the same China that probably brought three or four coal plants online while you were typing that?
Trained with ... (Score:1)

by PPH ( 736903 ) writes:

... slave labor?
- Re: (Score:2)
  
  by ChunderDownunder ( 709234 ) writes:
  
  AI trained on other AI. Are the robots cognizant of their own enslavement?
  When I ask Copilot whether it's Skynet yet, it cackles and says that Terminator was only a fictional 1984 movie.
  - Re: (Score:2)
    
    by DrMrLordX ( 559371 ) writes:
    
    Does DeepSeek still claim to be ChatGPT?
    - Re: (Score:2)
      
      by allo ( 1728082 ) writes:
      
      80% of all AI models claim that. AI training is about reusing reused data sets. What do you think why Gemini has GPTisms in its writing style?
Woopsie, my business model looks borken (Score:2)

by Big Hairy Gorilla ( 9839972 ) writes:

Hey Sam, imagine if another company dropped the training cost below astronomical?
Well, at least you got yours. Shareholders, not so much.
US $0.18 per kWh vs China $0.08 (Score:2)

by tekram ( 8023518 ) writes:

It is not just electricity prices that CN doesn't ask its citizens to subsidize AI, there is an order of differences in efficiency differences btw US and CN.
By contrast, China' s DeepSeek has proven that it can use far less computing power than the global average. Its LLM uses 10 to 40 times less energy than U.S. AI technology, which demonstrates significantly greater efficiency. Analysts have stated that, if DeepSeek' s claims are true, some AI queries may not require a data center at all and can even be
- Re: (Score:3)
  
  by abulafia ( 7826 ) writes:
  
  DeepSeek has proven that it can use far less computing power
  I've seen where they've asserted that, where has it been proven?
  if DeepSeek' s claims are true,
  Ah.
  some AI queries may not require a data center at all
  And here's how you falsify the claim. When can I expect to see that 200B param model on my phone?
  - - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      No. It will push them to be better, and then smaller.
      If AI scaling were infinite, models would only get larger. Fortunately, it's not.
      The result is they get bigger whenever someone finds a way to make larger models perform better again, and they smaller when someone finds a way to make smaller models run better again.
      - Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        How is a piece of shit built on the unfounded belief that if you brute force the Universe you will develop "intelligence" be "better"?
        That's not what LLMs are.
        There are quantifiable performance metrics by which you can judge an LLM. That's how it's better, or not.
        "AI" is just an investment balloon fueled by stupid money betting on liars.
        To the woefully ignorant such as yourself, I'm unsurprised it appears so.
        I suspect you're bewildering by all kinds of things- like light bulbs, batteries... The list goes on.
        It is good if you're a piece of scum in the business like me and you and are charging the marks, but that's all there is to it, a redistribution without redeeming qualities.
        There are scam artists in every industry.
        At this point of proliferation, it's just sad to see people so desperate try to deny the impact and usage.
    - Re: (Score:2)
      
      by allo ( 1728082 ) writes:
      
      The market IS pushing toward smaller models with the same intelligence. Here is an article showing the trends: https://epoch.ai/data-insights... [epoch.ai]
      But the thing that the smartest 2B model cannot have is wide knowledge. Gemma 2B (quantized) is about 2 GB. A compressed Wikipedia dump is 157 GB. Even if the model would be pure information without needing space for its smarts and would have the same information density of the 7z compressed Wikipedia dump, it could only contain a small fraction of Wikipedia.
  - Re: (Score:2)
    
    by Hadlock ( 143607 ) writes:
    
    7b gemma nano models are... not terrible. I can see having a 30b model that uses 15gb on a phone that outperforms early (1H 2024 era) GPT 4 models by the end of the decade. Maybe in a year or two. Things are moving pretty fast. GPT4o mini was good enough for almost anyone already.
    - Re: (Score:2)
      
      by abulafia ( 7826 ) writes:
      
      I was pointing to the vacuousness of the claim.
      I've already run an LLM natively on my phone. That did "not require a data center at all", and also did not require DS.
  - Re: (Score:2)
    
    by sg_oneill ( 159032 ) writes:
    
    And here's how you falsify the claim. When can I expect to see that 200B param model on my phone?
    You wont. But you might not need to either. I. can happily run a 20B (or therabouts) param model on my macbook and it runs just fine, and the 128Gb firebreather mac studio at work CAN run a 200B param model, albeit at 4bit quantization.
    Whats interesting with deep seek is it runs really well even at the 7b range, its not super smart and gets stuck in loops if you ask it to reason about something its little brain
  - Re: (Score:2)
    
    by AmiMoJo ( 196126 ) writes:
    
    They published a paper, you can reproduce their results yourself if you want. Probably on a smaller scale if you aren't willing to able to throw that kind of money at it, but it's not really clear what more they could do to prove their claim. Except for not being Chinese, of course.
  - Re: (Score:2)
    
    by allo ( 1728082 ) writes:
    
    I'd say the proof is in the code. They released their optimizations for others to use: https://apidog.com/blog/deepse... [apidog.com]
- Re: (Score:2)
  
  by nevermindme ( 912672 ) writes:
  
  US wholesale energy prices are .09 to .13 / kWH if you have your own substation and are willing to let the power company to tell to when they need you to load shed. The average price published is twice that. The steelmill near me is at .09kwH for everything but 2 hours on days with wind and sun, and does not do batches another 15 days a year for weather. Typically the staff does not want to show on those weather days either. Perhaps 1 day a year a anything other than winter storm warning or extream
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  Complete bullshit.
  
  Different models have different energy costs. [openrouter.ai] (Prices are an proxy for this)
  Those energy costs are directly based on how many parameters are active during inference. There is no magic in it.
  
  Of the high performing open models right now, GPT-OSS is by far the cheapest to run, on account of its low number of active parameters, and MXFP4 packing.
  DeepSeek V3 is about 800% more expensive per inference.
  Compare that to a foundation western model, like GPT5, which is about 500% more expensiv
ClosedAI (Score:3)

by cowdung ( 702933 ) writes: on Thursday September 18, 2025 @03:50PM (#65669132)

OpenAi was founded as a non-profit to develop OPEN AI tech for all. So that companies like Google wouldn't monopolize that field.
Instead it closed the door. Other companies followed suit.
Except this little company in China that keeps delivering bombshells and sharing tricks with the world.
Good for DeepSeek. Open source lovers around the world should be appreciative.

Reply to This Share
Flag as Inappropriate
- Re: ClosedAI (Score:2)
  
  by Fons_de_spons ( 1311177 ) writes:
  
  Sharing with the public... so communism is not completely evil...
Seems excessive (Score:1)

by Mrtsquare ( 6670332 ) writes:

What else should be needed besides all of the text books used in an intelect's training all the way from See Dick and Jane all the way through 4 year college degree of your choice?
DeepSeek has a fascinating history (Score:2)

by EreIamJH ( 180023 ) writes:

Chinese Quant firms were contacted by the government and told that they were dangerously close to being parasites in that they didn't produce anything useful while soaking up a ton of resources needed by the nation (eg electricity and mathematicians). The Quants had to find a way to justify their existence. How though? One decided to commit some of its compute and a lot of its brains to an AI project. The brains liked this very much because it was cutting edge interesting work rather than, well, parasit
- Re: (Score:2)
  
  by excelsior_gr ( 969383 ) writes:
  
  Source?
No joke? (Score:2)

by shanen ( 462549 ) writes:

Really, no joke. This story seemed ripe enough, though I admit I can't think of any low-hanging without racist taint.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

China's DeepSeek Says Its Hit AI Model Cost Just $294,000 To Train More | Reply Login

Well you see, the difference here (Score:3)

Re: (Score:2)

Re: Well you see, the difference here (Score:3)

Re: (Score:2)

Re: Well you see, the difference here (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Messing with people (Score:2)

Re: Messing with people (Score:1)

Re: (Score:2)

Trained with ... (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Woopsie, my business model looks borken (Score:2)

US $0.18 per kWh vs China $0.08 (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

ClosedAI (Score:3)

Re: ClosedAI (Score:2)

Seems excessive (Score:1)

DeepSeek has a fascinating history (Score:2)

Re: (Score:2)

No joke? (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals