
China's DeepSeek Says Its Hit AI Model Cost Just $294,000 To Train (reuters.com) 60
Chinese AI developer DeepSeek said it spent $294,000 on training its R1 model, much lower than figures reported for U.S. rivals, in a paper that is likely to reignite debate over Beijing's place in the race to develop artificial intelligence. Reuters: The rare update from the Hangzhou-based company -- the first estimate it has released of R1's training costs -- appeared in a peer-reviewed article in the academic journal Nature published on Wednesday.
DeepSeek's release of what it said were lower-cost AI systems in January prompted global investors to dump tech stocks as they worried the new models could threaten the dominance of AI leaders including Nvidia. Since then, the company and founder Liang Wenfeng have largely disappeared from public view, apart from pushing out a few new product updates.
[...] The Nature article, which listed Liang as one of the co-authors, said DeepSeek's reasoning-focused R1 model cost $294,000 to train and used 512 Nvidia H800 chips. Sam Altman, CEO of U.S. AI giant OpenAI, said in 2023 that what he called "foundational model training" had cost "much more" than $100 million - though his company has not given detailed figures for any of its releases.
DeepSeek's release of what it said were lower-cost AI systems in January prompted global investors to dump tech stocks as they worried the new models could threaten the dominance of AI leaders including Nvidia. Since then, the company and founder Liang Wenfeng have largely disappeared from public view, apart from pushing out a few new product updates.
[...] The Nature article, which listed Liang as one of the co-authors, said DeepSeek's reasoning-focused R1 model cost $294,000 to train and used 512 Nvidia H800 chips. Sam Altman, CEO of U.S. AI giant OpenAI, said in 2023 that what he called "foundational model training" had cost "much more" than $100 million - though his company has not given detailed figures for any of its releases.
Well you see, the difference here (Score:3)
Is that Sam Altman and buds get to pocket $99,706,000 while the Chinese developers don't get that luxury.
Re: (Score:2)
Either way it shows just how far ahead they are, and how ineffective the export ban on Nvidia chips is. Even if a domestic chip is only half as efficient as an Nvidia one, it's not going to raise the cost of training AI models enough to matter.
This also makes the Chinese tech much more attractive to other countries as it's not such a huge environmental disaster.
Re: Well you see, the difference here (Score:3)
I call BS on that price. But A+ on their determination of undermining US companies with a simple press release.
Re: (Score:2)
The difference is Altman was talking about the total price, while DeepSeek is not counting the $20M in H800s they used.
Re: Well you see, the difference here (Score:2)
Re: (Score:3)
They counted exactly what they say the counted: The training cost.
What would a number help you that contained hardware and personal cost, if you already paid for your hardware and calculated your salaries? The question you have when reading the paper is "How much will it cost me to let my GPUs run to train a similar model" and that is answered.
A technical paper about AI model training is not a business report of a company. They do not reason about how much tax they have to pay for the year, but what the mod
Re: (Score:2)
Re: (Score:2)
But they trained two base models, DeepSeek V3 and DeepSeek R1, and published them so others can fine-tune on top of them. The paper is about the training cost for the R1 model.
Re: (Score:2)
Re: (Score:2)
You said the cheapest way is to finetune an open weight model and you're right with it. But the cost here was for training a base model.
They also tuned some open weight models with the output of the base model. That's what DeepSeek-R1-Llama, etc. are about and why ollama users think they are running R1 when their tool downloaded a R1 version distilled into a llama model.
Re: (Score:2)
Re: (Score:2)
You cannot build something on a ChatGPT (Rather GPT-3/4/5 as we're talking about the model not the service) basis without the basis leaking. There are leaked models like an older Mistral medium version, but most cloud models are kept confidential.
Many models even trained by companies are based on other models and they clearly label it. For example Microsofts WizardLM2 7B was built on top of Mistral 7B. Others train a new base model (like Mistral 7B) so others can fine-tune on top of that.
DeepSeek released t
Re: (Score:2)
Re: (Score:2)
Almost all models are contamined by ChatGPT. Especially the earlier Community-Models always answered "I am made by OpenAI" even when using a completely different base. Why? Because of datasets. Browse a few public datasets and see how often they contain references to the model being ChatGPT.
Now there are many public datasets, but the first datasets people created for finetuning were literally giving ChatGPT texts and prompts and ask it to write something similar. Ever wondered why all models talk about Elar
Re: (Score:2)
https://theconversation.com/op... [theconversation.com]
Re: (Score:2)
Bye.
Re: (Score:2)
The open source community are building equally impressive models for much less. Its entirely plausible when you factor in lower wages, cheap chinese tech, and the sheer determination the chinese have to catch up with the americans.
Re: (Score:1)
What are you referring to here? What happened between Microsoft and Deepseek?
Re: (Score:1)
Oh, never mind, I get it. Sam Altman is the crypto grifter
Re: (Score:2)
I thought that was Orangelini?
Re: (Score:2)
Orangelini is a very high-level grifter, he's now peddling power for money directly.
Sometimes he may be reading something from a script printed in VERY LARGE LETTERS, but he's not even pretending to care about whatever it is he's getting paid for.
Unlike the grifters that are lower in the pecking order who have to sell something more than "liberal tears".
Messing with people (Score:2)
If I were running the their team, I would absolutely fuck with OAI and other competitors like this. They can't discount it completely - this is still early days, there almost certainly are undiscovered efficiency tricks out there.
But it forces them to spend time and money chasing those based on whatever is in DS's paper. Messes with their OODA loop, if you think about things that way.
Re: Messing with people (Score:1)
Re: (Score:2)
This would be the same China that probably brought three or four coal plants online while you were typing that?
Trained with ... (Score:1)
Re: (Score:2)
AI trained on other AI. Are the robots cognizant of their own enslavement?
When I ask Copilot whether it's Skynet yet, it cackles and says that Terminator was only a fictional 1984 movie.
Re: (Score:2)
Does DeepSeek still claim to be ChatGPT?
Re: (Score:2)
80% of all AI models claim that. AI training is about reusing reused data sets. What do you think why Gemini has GPTisms in its writing style?
Woopsie, my business model looks borken (Score:2)
Well, at least you got yours. Shareholders, not so much.
US $0.18 per kWh vs China $0.08 (Score:2)
It is not just electricity prices that CN doesn't ask its citizens to subsidize AI, there is an order of differences in efficiency differences btw US and CN.
By contrast, China' s DeepSeek has proven that it can use far less computing power than the global average. Its LLM uses 10 to 40 times less energy than U.S. AI technology, which demonstrates significantly greater efficiency. Analysts have stated that, if DeepSeek' s claims are true, some AI queries may not require a data center at all and can even be
Re: (Score:3)
I've seen where they've asserted that, where has it been proven?
if DeepSeek' s claims are true,
Ah.
some AI queries may not require a data center at all
And here's how you falsify the claim. When can I expect to see that 200B param model on my phone?
Re: (Score:2)
If AI scaling were infinite, models would only get larger. Fortunately, it's not.
The result is they get bigger whenever someone finds a way to make larger models perform better again, and they smaller when someone finds a way to make smaller models run better again.
Re: (Score:2)
How is a piece of shit built on the unfounded belief that if you brute force the Universe you will develop "intelligence" be "better"?
That's not what LLMs are.
There are quantifiable performance metrics by which you can judge an LLM. That's how it's better, or not.
"AI" is just an investment balloon fueled by stupid money betting on liars.
To the woefully ignorant such as yourself, I'm unsurprised it appears so.
I suspect you're bewildering by all kinds of things- like light bulbs, batteries... The list goes on.
It is good if you're a piece of scum in the business like me and you and are charging the marks, but that's all there is to it, a redistribution without redeeming qualities.
There are scam artists in every industry.
At this point of proliferation, it's just sad to see people so desperate try to deny the impact and usage.
Re: (Score:2)
The market IS pushing toward smaller models with the same intelligence. Here is an article showing the trends: https://epoch.ai/data-insights... [epoch.ai]
But the thing that the smartest 2B model cannot have is wide knowledge. Gemma 2B (quantized) is about 2 GB. A compressed Wikipedia dump is 157 GB. Even if the model would be pure information without needing space for its smarts and would have the same information density of the 7z compressed Wikipedia dump, it could only contain a small fraction of Wikipedia.
Re: (Score:2)
7b gemma nano models are... not terrible. I can see having a 30b model that uses 15gb on a phone that outperforms early (1H 2024 era) GPT 4 models by the end of the decade. Maybe in a year or two. Things are moving pretty fast. GPT4o mini was good enough for almost anyone already.
Re: (Score:2)
I've already run an LLM natively on my phone. That did "not require a data center at all", and also did not require DS.
Re: (Score:2)
You wont. But you might not need to either. I. can happily run a 20B (or therabouts) param model on my macbook and it runs just fine, and the 128Gb firebreather mac studio at work CAN run a 200B param model, albeit at 4bit quantization.
Whats interesting with deep seek is it runs really well even at the 7b range, its not super smart and gets stuck in loops if you ask it to reason about something its little brain
Re: (Score:2)
They published a paper, you can reproduce their results yourself if you want. Probably on a smaller scale if you aren't willing to able to throw that kind of money at it, but it's not really clear what more they could do to prove their claim. Except for not being Chinese, of course.
Re: (Score:2)
I'd say the proof is in the code. They released their optimizations for others to use: https://apidog.com/blog/deepse... [apidog.com]
Re: (Score:2)
Re: (Score:2)
Different models have different energy costs. [openrouter.ai] (Prices are an proxy for this)
Those energy costs are directly based on how many parameters are active during inference. There is no magic in it.
Of the high performing open models right now, GPT-OSS is by far the cheapest to run, on account of its low number of active parameters, and MXFP4 packing.
DeepSeek V3 is about 800% more expensive per inference.
Compare that to a foundation western model, like GPT5, which is about 500% more expensiv
ClosedAI (Score:3)
OpenAi was founded as a non-profit to develop OPEN AI tech for all. So that companies like Google wouldn't monopolize that field.
Instead it closed the door. Other companies followed suit.
Except this little company in China that keeps delivering bombshells and sharing tricks with the world.
Good for DeepSeek. Open source lovers around the world should be appreciative.
Re: ClosedAI (Score:2)
Seems excessive (Score:1)
DeepSeek has a fascinating history (Score:2)
Re: (Score:2)
No joke? (Score:2)
Really, no joke. This story seemed ripe enough, though I admit I can't think of any low-hanging without racist taint.