Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI China

Were DeepSeek's Development Costs Much Higher Than Reported? (msn.com) 49

Nearly three years ago a team of Chinese AI engineers working for DeepSeek's parent company unveiled an earlier AI supercomputer that the Washington Post says was constructed from 10,000 A100 GPUs purchased from Nvidia. Roughly six months later "Washington had banned Nvidia from selling any more A100s to China," the article notes.

Remember that number as you read this. 10,000 A100 GPUs... DeepSeek's new chatbot caused a panic in Silicon Valley and on Wall Street this week, erasing $1 trillion from the stock market. That impact stemmed in large part from the company's claim that it had trained one of its recent models on a minuscule $5.6 million in computing costs and with only 2,000 or so of Nvidia's less-advanced H800 chips.

Nvidia saw its soaring value crater by $589 billion Monday as DeepSeek rocketed to the top of download charts, prompting President Donald Trump to call for U.S. industry to be "laser focused" on competing... But a closer look at DeepSeek reveals that its parent company deployed a large and sophisticated chip set in its supercomputer, leading experts to assess the total cost of the project as much higher than the relatively paltry sum that U.S. markets reacted to this week... Lennart Heim, an AI expert at Rand, said DeepSeek's evident access to [the earlier] supercomputer would have made it easier for the company to develop a more efficient model, requiring fewer chips.

That earlier project "suggests that DeepSeek had a major boost..." according to the article, "with technology comparable to that of the leading U.S. AI companies." And while DeepSeek claims it only spent $5.6 million to train one of its advanced models, "its parent company has said that building the earlier supercomputer had cost 1 billion yuan, or $139 million.") Yet the article also cites the latest insights Friday from chip investment company SemiAnalysis, summarizing their finding that DeepSeek "has spent more than half a billion dollars on GPUs, with total capital expenditures of almost $1.3 billion."

The article notes Thursday remarks by OpenAI CEO Sam Altman that DeepSeek's energy-efficiency claims were "wildly overstated... This is a model at a capability level that we had quite some time ago." And Palmer Luckey called DeepSeek "legitimately impressive" on X but called the $5.6 million training cost figure "bogus" and said the Silicon Valley meltdown was "hysteria." Even with these higher total costs in mind, experts say, U.S. companies are right to be concerned about DeepSeek upending the market. "We know two things for sure: DeepSeek is pricing their services very competitively, and second, the performance of their models is comparable to leading competitors," said Kai-Shen Huang, an AI expert at the Research Institute for Democracy, Society and Emerging Technology, a Taipei-based think tank. "I think DeepSeek's pricing strategy has the potential to disrupt the market globally...."

China's broader AI policy push has helped create an environment conducive for a company like DeepSeek to rise. Beijing announced an ambitious AI blueprint in 2017, with a goal to become a global AI leader by 2030 and promises of funding for universities and private enterprise. Local governments across the nation followed with their own programs to support AI.

This discussion has been archived. No new comments can be posted.

Were DeepSeek's Development Costs Much Higher Than Reported?

Comments Filter:
  • by Anonymous Coward

    Couldn’t really give a fuck either

    • No, they didn't lie. People can't read whitepapers or straightforward lie through their teeth to keep the competition afloat.
      • by gweihir ( 88907 )

        Indeed. Most of the current AI hype is built on lies anyways, what are a few more. Especially if billions of stupid money is ripe for the taking.

      • Mostly thanks for changing the vacuous Subject, but I do have some thoughts spinning off of your new topic...

        The topic of "lies" is fraught with danger of misunderstanding. I think you can make a legitimate argument that the cost of training this particular model was affected by experiences gained with earlier models, but trying to assess the value of those experiences (to add in as costs?) would be quite difficult. Perhaps even crazy. Like trying to assess the value of shares in any public company these ye

  • They bought PUT options, made the annoucement, profited...

  • by Anonymous Coward on Saturday February 01, 2025 @11:54AM (#65134923)

    Everyone writing "And while DeepSeek claims it only spent $5.6 million to train one of its advanced models," is either a complete idiot or a liar.

    Because the source [arxiv.org] said:
    "Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M.
    Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data."

    DeepSeek never reported their costs for R1, let alone their total costs. Never claimed it was only $6m.

    • by Forty Two Tenfold ( 1134125 ) on Saturday February 01, 2025 @12:05PM (#65134951)
      It's called FUD. They (Deepseek's competitos) lie to stay afloat.
      • by gweihir ( 88907 )

        Clearly. Now, if they had some actual, I don't know, Business Model that makes sense, at this time, several years into the hype they might not have to lie so much, but they do not. This thing eats money like crazy and delivers mostly nothing.

        It also does harm: Attack code generation is not accessible to the clueless and the incompetent, and exploit generation for new vulnerabilities has been massively accelerated. That works because attack code does not have to be reliable or secure. Quite unlike other code

      • by allo ( 1728082 )

        They need to calm down their investors.

      • They all lie. We don't know all the details of who's lying about what, though, and there may be even more lying than anticipated.

      • They lied about their competitor being cheaper? I guess I lost the plot here... we hate AI and AI companies because either they terk yer jerbs or they're not good enough to terk yer jerb and everything in between.

        I'm just using it to get real work done, not investing in NVDA or paying attention to the hype and anti-hype. It's a damn tool and it's good. That makes people jealous and afraid apparently. Weirdos.

    • by AmiMoJo ( 196126 ) on Saturday February 01, 2025 @01:19PM (#65135051) Homepage Journal

      It also misses the very important point that it's not just that their costs are lower, it's that they were able to train their very capable AI with GPUs that are not export controlled. GPUs that have performance well within the realm of what China is starting to produce itself.

      In other words the export ban failed and if anything just spurred a Chinese company to leap ahead of US competitors, who are now scrambling to copy what DeepSeek did.

      • by shanen ( 462549 )

        Mod parent up with sadness.

      • by larryjoe ( 135075 ) on Saturday February 01, 2025 @02:53PM (#65135227)

        It also misses the very important point that it's not just that their costs are lower, it's that they were able to train their very capable AI with GPUs that are not export controlled. GPUs that have performance well within the realm of what China is starting to produce itself.

        In other words the export ban failed and if anything just spurred a Chinese company to leap ahead of US competitors, who are now scrambling to copy what DeepSeek did.

        It may be that DeepSeek made major advances or maybe not. We'll find out over the next year or so. However, it's clear that DeepSeek did make some advances, including some that weren't obvious. The workaround at the PTX level to increase the intermodule bandwidth that was intentionally crippled for H800 is impressive. Perhaps there is something to be learned from that, either for software or hardware, and Nvidia should be paying very close attention to this. Memory savings via multi-head latent attention is something that will become widespread.

        The super big question right now is whether these advances will lead to future models running on "smaller" processors or amped-up larger models running on yet larger processors. That is, it's not clear if scaling doesn't apply to something like V3. Whereas GPT-4/5 would have used obscene amounts of hardware, it might be possible to run the MLA-equivalent of GPT-6/7 on the same obscene amounts of hardware.

        The other big thing is what the recent SemiAccurate article pointed out. DeepSeek shook up the AI world, but it's simply one of many companies and events that have and will continue to shake up the AI world. All of these events are good and increase the attention and resources devoted to AI research, which accelerates the rate of progress.

      • by Shaitan ( 22585 )

        "were able to train their very capable AI with GPUs that are not export controlled."

        That isn't actually proven which is the point of this article. They trained the previous iteration on controlled GPUs... which didn't suddenly vanish with the ban. Also while deepseek performs well in benchmarks it hasn't done nearly as well for me in real world usage. Even if it worked as claimed that would only put them on par with US competitors not leaped ahead of them.

        The people scrambling to copy what deepseek did are

  • by chmod a+x mojo ( 965286 ) on Saturday February 01, 2025 @12:14PM (#65134961)

    Deepseek literally said they spent 5 million training THIS fucking model for fucks sake. No, you don't count the cost of training a DIFFERENT model on different architectures / training in the cost of a different and separate god damn model.

    Yes, they spent more than 5 million on prior research. So has pretty much every company in existence. Just because they spent money BEFORE, doesn't mean they spent that money ON THIS SPECIFIC PROJECT.

    That's like trying to claim the dude who wrote a fart app in one hour of dev time didn't REALLY only spend only an hour on the fart app and to count every hour of dev time he ever did in his life prior to when he wrote the fart app. Sounds kinda fucking stupid when put that way doesn't it?

    • by SoftwareArtist ( 1472499 ) on Saturday February 01, 2025 @01:46PM (#65135107)

      Deepseek literally said they spent 5 million training THIS fucking model for fucks sake.

      They didn't even say that much. They said how many GPU hours were used for each stage of training, then calculated how much it would cost to rent that many GPUs at current rates. Anyone with an account on AWS or Azure can now train a model for that much, even if they don't own any GPUs at all.

      • by Entrope ( 68843 )

        Anyone with an account on AWS or Azure can now train a model for that much, even if they don't own any GPUs at all.

        Set aside the capital costs in developing the know-how. What data would anyone with an account on AWS or Azure train that model on? Do AWS and Azure make it free to crawl the web and slurp up 10 trillion-plus tokens of data, or to transfer that from somewhere else? Do they make it free to store and load that data while training the model?

        It's dishonest to pretend that GPU rental is the sum of costs to train a model

        • It's dishonest to pretend that GPU rental is the sum of costs to train a model

          They never claimed it was. The only people being dishonest are the ones falsely claiming they did.

          Do AWS and Azure make it free to crawl the web and slurp up 10 trillion-plus tokens of data, or to transfer that from somewhere else?

          If you're looking for open datasets to train LLMs, here are some good starting places.

          https://huggingface.co/collect... [huggingface.co]
          https://github.com/argilla-io/... [github.com]

          In addition, people often include synthetic data generated by another LLM. (OpenAI has accused DeepSeek of using their model [slashdot.org] that way to train their own model.) You might find this project [github.com] interesting. It's an attempt to reproduce R1 using a fully open pipel

          • by Entrope ( 68843 )

            They never claimed it was. The only people being dishonest are the ones falsely claiming they did.

            Respond to what I said, not some straw man. When a person says "this cost the equivalent of X", they are implying that's the only relevant cost.

            If you're looking for open datasets to train LLMs, here are some good starting places.

            I'm not. Are you suggesting that all DeepSeek did was download one of those? Your last link suggests otherwise.

    • Deepseek literally said they spent 5 million training THIS fucking model for fucks sake. No, you don't count the cost of training a DIFFERENT model on different architectures / training in the cost of a different and separate god damn model.

      Yes, they spent more than 5 million on prior research. So has pretty much every company in existence. Just because they spent money BEFORE, doesn't mean they spent that money ON THIS SPECIFIC PROJECT.

      True. So, a fully fleshed out company with all datasets, experts, already working models, etc. would incrementally only need the $5.6 million. That does mean something, especially if the trial-and-error runs during the research phase were also much faster.

      However, it doesn't mean that any Joe in his garage can duplicate what DeepSeek did because just the trial-and-error runs during the research phase would have been a few orders of magnitude higher in dollars. And that doesn't account for the non-process

  • by rabun_bike ( 905430 ) on Saturday February 01, 2025 @12:30PM (#65134981)
    I mean that is the real question right? Are the economics preached by AI industry a lie? I would say this puts it in serious question regardless if the actual cost was truly 1/10th or 1/4th less. What we do know is it was definitely substantially less.
  • by MacMann ( 7518492 ) on Saturday February 01, 2025 @12:44PM (#65135001)

    China is run by the CCP and most anything out of China to the rest of the world has to go through the CCP. Because the CCP want to be a big player in the world, or at least look like one, they will talk up their capabilities.

    Why lie to the world? Won't people figure out the lies eventually? So long as the lies leave enough doubt on what is true China can use their announcements of their capabilities in technology, military, economy, medicine, or what not, to cause other nations to act in ways they otherwise would not.

    This can backfire in a huge way as proven by the Boeing F-15. Announcements from the Soviets in the 1960s and early 1970s had the DoD concerned that if the Cold War warmed up that they'd be at a huge disadvantage against aircraft from the Soviets. They worked hard to get an aircraft that was at least on par with the reports on what the Soviets claimed they had. The F-15 was created in response, at considerable expense. When aircraft were captured from the Soviets by various means there was a realization that the US military was 50 years ahead in technology. The F-15 has had some updates on electronics since the 1970s but the rest of the plane changed little and remains in service still because it is just an awesome aircraft.

    I read recently that China is building nuclear powered icebreakers. Is this really happening? It's likely nobody knows for certain. If this is believed by people in the White House and Congress then we should expect the US Coast Guard to get nuclear icebreakers too.

    Oh, and Russia has had nuclear icebreakers for decades but they don't appear to concern the powers-that-be in DC all that much. Much of the Russian icebreaker fleet is in bad shape. One such icebreaker collided with a cargo ship days ago and has sustained damage. The cooling systems on all Russian icebreakers are sub par so while most of them have multiple reactor cores they can't operate all of them at the same time or it could over heat even in Arctic waters, and this keeps them from leaving the Arctic to go to the Antarctic. Chinese icebreakers will be forced to have proper cooling to transit tropical seas or none will be able to leave port under their own power, they don't have the luxury of leaving them in ice cold water all year. I expect to see the US Coast Guard get nuclear icebreakers soon as a response.

    I suspect the CCP is lying about their GPT technology and as a result the rest of the world will be changing tactics on development of their own technology, dumping in even more money to speed things along, and end up surpassing whatever China claims they have.

    • Boeing F-15? Just like Disney releasing Star Wars in 1977!
      • https://en.wikipedia.org/wiki/... [wikipedia.org]

        Boeing produces the F-15 now. The original F-15 from 1975 or so was such an awesome design that Boeing still produces it today.

        The Boeing F/A-18 is a similar story by having it's origins in the 1970s. It didn't play out the same because there was a significant shift in the air frame design around 1995 and Boeing acquiring production which makes the original design something I'd consider not quite as everlasting. The current E/F variants of the F/A-18 is about 30 years old

  • Costs (Score:5, Insightful)

    by Savage-Rabbit ( 308260 ) on Saturday February 01, 2025 @01:01PM (#65135025)

    Even if DeepSeek only cost half of what OpenAIs latest offering cost per run that would still be news. But that is still not the most important thing that DeepSeek did, that would be the fact that they open source a grade-A model worth billions of dollars.

  • by hdyoung ( 5182939 ) on Saturday February 01, 2025 @01:15PM (#65135045)
    Some correction. While every point in the article is probably true, it’s basically good thing that tech stocks got hammered. They’re trading at roughly a trillion price-to-earnings ratio. Ok, the real number is around 40, but that’s still stupidly, stupidly high. The smart analysis I read is that we’re in bubble territory, no matter what angle you look at it from. I’d rather see priced knocked down bit by bit, rather than one massive pop. Those of us with functioning memory remember how 08-09 played out.
    • 40 isn't that high. Especially for tech.

      Utility companies and other "value" stocks tend to trade at 8-12.
      Non tech companies that produce some normal product typically trade around 18-20.
      A tech company that has the potential to change the world trading at only 40 is actually pretty cheap. If they hit it then 40 will shrink to nothing as they pull in a few zillion bucks.

  • They didn't lie, they weren't however completely open with costs, just basically a statement on number of GPU hours for that specific model without revealing any other costs.
  • "and second, the performance of their models is comparable to leading competitors"

    Deepseek isn't bad and obviously does well on benchmarks but at least for my small test [python wumpus world type problem] it didn't get anywhere near as far as O1 [both failed to solve].

  • The Chiense government will continue to subsidise, under cutting competition.

    Undercutting everyone is going to remove profit from the entire industry, slowing down funding for others to advance.

  • Did the Pope practise Catholicism.

All seems condemned in the long run to approximate a state akin to Gaussian noise. -- James Martin

Working...