Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
AI Technology

DeepSeek Has Spent Over $500 Million on Nvidia Chips Despite Low-Cost AI Claims, SemiAnalysis Says (ft.com) 146

Nvidia shares plunged 17% on Monday, wiping nearly $600 billion from its market value, after Chinese AI firm DeepSeek's breakthrough, but analysts are questioning the cost narrative. DeepSeek said to have trained its December V3 model for $5.6 million, but chip consultancy SemiAnalysis suggested this figure doesn't reflect total investments. "DeepSeek has spent well over $500 million on GPUs over the history of the company," Dylan Patel of SemiAnalysis said. "While their training run was very efficient, it required significant experimentation and testing to work."

The steep sell-off led to the Philadelphia Semiconductor index's worst daily drop since March 2020 at 9.2%, generating $6.75 billion in profits for short sellers, according to data group S3 Partners. DeepSeek's engineers also demonstrated they could write code without relying on Nvidia's Cuda software platform, which is widely seen as crucial to the Silicon Valley chipmaker's dominance of AI development.

DeepSeek Has Spent Over $500 Million on Nvidia Chips Despite Low-Cost AI Claims, SemiAnalysis Says

Comments Filter:
  • So $500 million in their chips were purchased for this sole purpose, and yet the share price plummeted 17%?

    Ok, this is going to sound crazy, but what if we valued a company by actual sales and not how high you can get off hopium?
    • Re: (Score:3, Interesting)

      According to some Youtube video I can't remember, so can't link, this was venture capital or investment firm which bought the GPUs for a different purpose and this actually was a side project intended to make use of the opportunity represented by already having the GPUs. Not that it should matter: the advance represents a strong case that LLMs -- and therefore CUDA cores, as far as realistic current requirements to reain and run LLMs -- have further to run before we're out of new advances. I agree with you
      • "I agree with you: this should be good news for NVidia stock"
        Even if this new model does lower the computing power required for training, this could still be good news for nVidia. The big deal about this model is not just the (alledged) low cost of training, but the low cost of running it, and the fact that it's a high quality open-source model. This puts AI in the hands of the masses.

        Someone likened this to Watt's invention of the condenser, which meant that steam engines suddenly used 80% less coa
    • So $500 million in their chips were purchased for this sole purpose, and yet the share price plummeted 17%?

      Ok, this is going to sound crazy, but what if we valued a company by actual sales and not how high you can get off hopium?

      I'm interested in your proposal, but would like more information.

      Exactly how would you set the stock price based on sales?

      Additionally, sales are usually reported every 3 months. Would your plan keep the stock price constant until the next quarterly report?

      If a company wants to issue more stock, for example to fund expansion or capital improvements, how would that work? Would the total value of all stocks go down to compensate?

      What advantages would come from implementing your proposal?

      Whether your proposal

      • Exactly how would you set the stock price based on sales?

        Ok, this is nuts, but you could look at it year over year and see if they are selling more now. Oh, maybe you could even make a projection for them to meet! But here is the killer part, we educate until the smooth brain investing goes away and logical agency creeps into place. We are breaking ground here, by god these people will be so much better off and the world richer in many ways.

        • but that won't work they may be pissing away more money on ceo's yachts every year than they sell. Even if sales increase year on year.
      • by unrtst ( 777550 )

        So $500 million in their chips were purchased for this sole purpose, and yet the share price plummeted 17%?

        Ok, this is going to sound crazy, but what if we valued a company by actual sales and not how high you can get off hopium?

        I'm interested in your proposal, but would like more information.

        Exactly how would you set the stock price based on sales?

        ...

        Exactly how do you set the stock price based on hopes and dreams?/s

        Investors looking more at the actual performance of companies rather than hype (good or bad) sounds like simply advising on classically good investment strategy.

    • Ok, this is going to sound crazy, but what if we valued a company by actual sales and not how high you can get off hopium?

      Hear you go, results from November [nvidia.com]. Just about every metric was up double digits from the previous year and/or quarter.
      • Hear you go, results from November [nvidia.com]. Just about every metric was up double digits from the previous year and/or quarter.

        So it only makes sense the valuation is lower. I think I’m starting to understand but I must consult Neurology first.

  • Microsoft has said they plan to spend $80bn on AI infrastrucuture in 2025 alone. Meta has said they plan to spend $65bn this year. If even a modest percentage of those infrastructure costs are going towards Nvidia GPUs, it'll dwarf what DeepSeek has allegedly spent in their entire history.

    • Microsoft has said they plan to spend $80bn on AI infrastrucuture in 2025 alone. Meta has said they plan to spend $65bn this year. If even a modest percentage of those infrastructure costs are going towards Nvidia GPUs, it'll dwarf what DeepSeek has allegedly spent in their entire history.

      Microsoft and Meta both report quarterly earnings on Jan 29. They will certainly be asked about their data center spending plans and whether they have changed. If they don't reiterate their already announced plans, then that's a bad sign for Nvidia. However, if they do reiterate, then that news outweighs the Deepseek news.

    • The whole point of this is that if DeepSeek demonstrates how to train and run models much more efficiently, then those projected purchases by Microsoft and Meta and everybody will plummet. If immediately then soon.
  • by GlennC ( 96879 ) on Tuesday January 28, 2025 @09:26AM (#65124763)

    China has successfully introduced a whole lot of FUD into the AI world, especially in the United States while making a sizeable dent to the financial markets at the same time.

    It's actually quite brilliant

    • China has successfully introduced a whole lot of FUD into the AI world, especially in the United States while making a sizeable dent to the financial markets at the same time.

      It's actually quite brilliant

      It's actually a very bad weakness of our markets.

      Just the dot.com bubble, the subprime loan, and AI will be next. A metric fuckton of money that pulls a disappearing act. Money invested in vapor.

      At some point, it's just bots on steroids. Eventually, if continued, it will be bots referencing only other bots, at that point "truth" might be anything. I'm waiting for AI to eliminate the laws of physics.

      Right now the trick is to refurbish shut down nuclear generation stations before the bottom drops out

      • by GlennC ( 96879 )

        It's actually a very bad weakness of our markets.

        That's why its brilliant. They're taking advantage of our weaknesses, and that it's open source is a plus because it's taking advantage of the overall distrust our political bickering has fostered.

    • by serviscope_minor ( 664417 ) on Tuesday January 28, 2025 @10:55AM (#65125013) Journal

      China has successfully introduced a whole lot of FUD into the AI world

      Has it? What, specifically. The US market is wall-to-wall bullshit spiritually led by the one-man-FUD-factory Altman... by comparison what's coming out of DeepSeek is straightforward.

      They aren't actually claiming anything magic (like most of the US players), they've released the model so you can check for yourself. What it appears is they have put together a bunch of techniques cleverly and carefully to achieve a big speed up, by making somewhat better use of the GPU resources they had. There's also a question of "cost": I'm not sure how the GPU cost was included.

      • by narcc ( 412956 )

        they've released the model so you can check for yourself

        Access to the model won't tell you anything about how much it cost to train.

        What it appears is they have put together a bunch of techniques cleverly and carefully to achieve a big speed up, by making somewhat better use of the GPU resources they had

        I'm going to guess that you didn't read the paper...

        Not that I blame you, it's absolutely stuffed with absurd hyperbolic language, e.g. DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors [...] DeepSeek-R1-Zero exhibits super performance on reasoning benchmarks. It's hard to read with your eyes rolling constantly.

        From what I could stomach, I didn't see anything there that any sane person w

        • I'm going to guess that you didn't read the paper...

          I believe this [arxiv.org] is the paper that's getting the US stock market to freak out, while

          Not that I blame you, it's absolutely stuffed with absurd hyperbolic language, e.g. DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors [...] DeepSeek-R1-Zero exhibits super performance on reasoning benchmarks. It's hard to read with your eyes rolling constantly.

          this [huggingface.co] seems to be the one you're referring to. Correct?

          Here's [bitrue.com] a comparison for those that are interested since they do different things.

    • by AmiMoJo ( 196126 )

      It's open source, probably because they thought that if it wasn't they would get this kind of accusation. You can go and confirm it for yourself if you like.

      I'm sure Meta and OpenAI have, and would be calling it out if it was fake.

      • It's open source

        It's not.

        probably because they thought that if it wasn't they would get this kind of accusation.

        If that were the case- they'd have released the source.

        You can go and confirm it for yourself if you like.

        You can. You will quickly see you cannot possibly reproduce their results with their source, because it's not open source.

        The weights themselves are freely available, and they did release a paper describing their training methodology, so it is possible for someone to try to reproduce their work, in general, but frankly- nothing they describe is particularly novel, there's no reason to think it doesn't work. It's an iterative improvement.

    • by jma05 ( 897351 )

      There is no FUD here, just irrational panic in our markets.

    • by migos ( 10321981 )
      The sole investor to deepseek is a hedge fund billionaire. Dude probably made tons shorting US market lol.
    • That dent isn't going to last long. Markets already recovered half of yesterday's losses.
  • by aldousd666 ( 640240 ) on Tuesday January 28, 2025 @09:33AM (#65124789) Journal
    It's no secret, no conspiracy, no crazy collusion to get you to think more communist. They found some tricks to take better advantage of older, less capable hardware, and those evil bastards even published their research! You too can use shittier chips to train good models! You don't even have to convert to communism to get the benefit. I shouldn't be so surprised by the way Americans are reacting to this, but I still am. How can people who are able to understand how all of the intricacies of deep learning and the math behind it be so hook-line-and-sinkered by the fear porn being published by investors who simply didn't understand what role top of the line chips actually play in AI? (Not as big of one as they thought.)
    • The trade is on the idea that nVidia has a locked-in monopoly and deserves to be priced at 500x future earnings.

      The details of who owned which chips and how details were allocated is less of the issue more than a typical MythBusters episode.

      • The trade is on the idea that nVidia has a locked-in monopoly and deserves to be priced at 500x future earnings.

        A bit of hyperbole, but still an important point to consider. There are two fundamental components to NVDA pricing.

        Market share and competitors is one part. Nvidia has about 80% of the market if ASICs are considered and about 98% otherwise. That hasn't changed. If someone can demonstrate better ASIC utility or efficiency (beyond what can already be done today), that would crater NVDA much more than this week's Deepseek news.

        The second part is the total addressable market. This is the part that Deepseek affe

        • Formerly there was the OpenAI + NVIDIA monolith for LLM-based AI. For whatever reason investors thought this monolith was going to stand the test of time, and most investment and projects were only geared towards this. With the Deep Seek release, it's now reasonable to assume that one will not need as many high-end NVIDIA chips...maybe not even NVIDIA at all. And who really needs OpenAI for that matter.

          My guess is research like this quickly advances other avenues of research and very soon someone coul

    • by Bert64 ( 520050 )

      If you applied similar optimization to newer hardware you should see even better results.

      What this shows is laziness on the part of those who have access to the latest hardware, they are content to just keep throwing more powerful hardware (and more money) at the problem rather than trying to make efficient use of the resources available.

      • If you applied similar optimization to newer hardware you should see even better results.

        Not necessarily, newer hardware is the focus of new code and optimizations, many times those optimizations are not ported back to older hardware because it is deemed obsolete for future use, so why bother with the effort.

        If all they have is older hardware due to sanctions, it makes sense they focused on optimizing for older hardware because that is all they have in bulk - they may have also simply ported optimizations for current hardware back to the old hardware they were using.

      • In no world since sw and hw came to life, has more efficient software reduced high end hardware sales. You buy the same amount of high end hardware to squeeze more out of your software (especially in a data center setting). I dont know how people thought Deep Seek (sofware) had any meaningful impact on NVDA (hardware) future sales. The more high end hardware a company can pump out, especially with no real competition, the sales just keep going up.
    • It's no secret, no conspiracy, no crazy collusion to get you to think more communist. They found some tricks to take better advantage of older, less capable hardware, and those evil bastards even published their research! You too can use shittier chips to train good models! You don't even have to convert to communism to get the benefit. I shouldn't be so surprised by the way Americans are reacting to this, but I still am. How can people who are able to understand how all of the intricacies of deep learning and the math behind it be so hook-line-and-sinkered by the fear porn being published by investors who simply didn't understand what role top of the line chips actually play in AI? (Not as big of one as they thought.)

      Differential analysis - the first rumbles of the bubble bursting

    • A very small number of people actually understand, say, stochastic gradient descent, on the mathematical level. Of course you don't need to; you just run the ADAM optimizer.

      A relatively small number of people even understand what a model is, and those people mostly didn't believe in Altman's bullshit.

      Let's bring the fallacy into sharp relief by applying your complaint to another GPU intensive field: crypto. "How can someone understand the math behind ZKPs and cryptography but still lose millions trading NFT

      • by narcc ( 412956 )

        A very small number of people actually understand, say, stochastic gradient descent, on the mathematical level.

        You've got to be kidding. The concept is simple enough to for a bright teenager to understand and the math, well, let me put it this way: vector calc is a 200 level course, often a required prerequisite to linear algebra, which you'll need if you intend to do anything related to data science. That's millions of people.

        A relatively small number of people even understand what a model is,

        I would expect anyone who took a statistics course to understand what a model is.

        and those people mostly didn't believe in Altman's bullshit.

        I would have thought so a few years ago, but here we are... Wishful thinking is more powerful than reason, a

    • by narcc ( 412956 )

      They found some tricks to take better advantage of older, less capable hardware

      Tell me you didn't read the paper without telling me you didn't read the paper.

  • On one hand CCP would happily "invest" into destabilizing US markets, on other hand Wall Street would not hesitate to push out "consultancy firm" disinfo to protect investments.
    • On one hand CCP would happily "invest" into destabilizing US markets, on other hand Wall Street would not hesitate to push out "consultancy firm" disinfo to protect investments.

      Exactly this. Propaganda to manipulate markets as well as popular sentiment - who knew?

  • by oumuamua ( 6173784 ) on Tuesday January 28, 2025 @09:49AM (#65124847)
    This is one of them, give it up. Even Marc Andreesson agrees

    Deepseek R1 is one of the most amazing and impressive breakthroughs I’ve ever seen — and as open source, a profound gift to the world.

    https://x.com/pmarca/status/18... [x.com]

    • At the moment it really is only bad news for Nvidia and its shareholders. Hopefully it will be good for the world. When I see the footage from Ukraine of today's remote-controlled drones being used to hunt down, sometimes taunt or harass, and then execute their targets it does worry me a bit. Nowhere to hide.
  • If I build a datacenter, or update it with some new GPUs, then use it for training some models, what should I call the "training cost"?

    I don't think there is any standard definition, but including the cost of the GPUs in the training cost for one model would seem odd since you are going to use them over and over for training many models as well as for inference.

    If seems that total number of FLOPs used for training would be a better way to measure cost, even if some companies have cheaper FLOPs (e.g. TPU vs

    • The bottleneck is not in the FLOPS, but in the communication channel between nodes, right?

      • Typical GPU utilization is well below 100%, but I think that's more to do with poor usage patterns than inter-GPU communication speed.

        My point was that "training cost" can mean almost anything (cost to buy GPUs, electricity cost to run them, etc), but at least number of FLOPs consumed during a training run, or number of FLOPs per training token, is a meaningful concrete number, reflective of model efficiency and therefore cost to run, and could be compared between models - it would be a more useful thing to

    • I believe there are two costs of an LLM. The first is training it. The second is having it answer queries. The latter is going to be much higher than the former although the conversation so far has focused on training costs.
      • I believe there are two costs of an LLM. The first is training it. The second is having it answer queries. The latter is going to be much higher than the former although the conversation so far has focused on training costs.

        There's a third cost, which is the subject of this thread, i.e., the R&D cost. The last training run to produce the weights is the easiest and least resource intensive part. Figuring out the architecture and training procedure, as well as procuring clean and useful data, are non-obvious and take a lot of people, hardware, and money.

        Now, the question is whether Deepseek's current with has solved and eliminated the need for future R&D. If so, then Deepseek's efficiency benefits future models. If not,

    • If I build a datacenter, or update it with some new GPUs, then use it for training some models, what should I call the "training cost"?

      I don't think there is any standard definition, but including the cost of the GPUs in the training cost for one model would seem odd since you are going to use them over and over for training many models as well as for inference.

      Yes, definitely the GPU capital costs should be amortized over the lifetime runs on that hardware. The Deepseek (and other model companies) approximate this amortized cost by counting GPU hours and then converting that to a comparable cloud cost.

  • When has that been a metric - I guess nvidia have spent a football field of trillions over their history then,
    • The company is only two years old, if that. So the time span is so small it is easy to estimate that all expenditures went into their product.

      And their expenditures don't match their claims.
      • If they spent $500m on hardware to get started, and then discovered a method that didn't require that much horsepower... Sure, on the company books there's a big 500m entry in the expense column, but you don't have to count that as part of the new method.

        Is this what happened? Don't know. I suspect not, but have no real information. Regardless, their claim is not obviously false.

  • No fucking shit? (Score:5, Insightful)

    by chmod a+x mojo ( 965286 ) on Tuesday January 28, 2025 @11:05AM (#65125051)

    >DeepSeek said to have trained its December V3 model for $5.6 million, but chip consultancy SemiAnalysis suggested this figure doesn't reflect total investments. "DeepSeek has spent well over $500 million on GPUs over the history of the company," Dylan Patel of SemiAnalysis said. "While their training run was very efficient, it required significant experimentation and testing to work."

    No fucking shit? You mean you have to start inefficient and work your way to greater and greater efficiency? Stop the fucking presses!

    I thought people just pulled process innovations out of their asses, no prior processing needed!

    God damn people are fucking stupid. Especially these so called "experts" that shitty clickbait "news" articles find.

    • >DeepSeek said to have trained its December V3 model for $5.6 million, but chip consultancy SemiAnalysis suggested this figure doesn't reflect total investments. "DeepSeek has spent well over $500 million on GPUs over the history of the company," Dylan Patel of SemiAnalysis said. "While their training run was very efficient, it required significant experimentation and testing to work."

      No fucking shit? You mean you have to start inefficient and work your way to greater and greater efficiency? Stop the fucking presses!

      I thought people just pulled process innovations out of their asses, no prior processing needed!

      God damn people are fucking stupid. Especially these so called "experts" that shitty clickbait "news" articles find.

      You're missing the point. It's not that prior experimentation and testing was needed; it's that they excluded the cost of that in their $5.6 million figure.

  • ...that future AI will be exclusively owned by big corporations and everyone will pay a lot to use it.
    The knowledge will be widespread and available to all.
    Expect more open source and small company advances, from labs all around the world

  • The chips redesigned and sold for a low cost, but they will be "licensed" and require a yearly license fee to be paid or they'll stop working.

    • China is likely already capable of making chips at the H800 level. I imagine that man other countries will soon be able to make similarly capable chips. The H800 was made to comply with export control requirements and I believe it's more powerful under the covers. If you're a sanctioned entity looking to be self-sufficient, you can choose to have a larger die area and more power consumption and still get equivalent compute as a short-term solution.
      • China is likely already capable of making chips at the H800 level.

        This is likely not correct. SMIC currently doesn't have a working 4nm process node. They have 7nm working and claim to have found a 5nm non-EUV solution. They also don't have CoWoS capabilities.

        • China is likely already capable of making chips at the H800 level.

          This is likely not correct. SMIC currently doesn't have a working 4nm process node. They have 7nm working and claim to have found a 5nm non-EUV solution. They also don't have CoWoS capabilities.

          I don't see how your post contradicts with mine. You will use more silicon and the chips will draw more power which might not make a great general purpose CPU. But if you are willing to use large dies and draw more power, you should be able to get equivalent performance. That's what I mean by H800 level. Not that they have all the technologies, but by sacrificing power and silicon area you can get similar performance with older fabrication technologies.

  • AI learning is a numbers game. You can do it with fewer more powerful processors, or with more less powerful processors. The tradeoff favors more powerful processors if you have high labor and other costs for building around them.

  • Cost takes into consideration profit their hedge fund made during yesterday's sell-off.
  • by rabun_bike ( 905430 ) on Tuesday January 28, 2025 @04:26PM (#65126095)
    This rush to spend billions based on simple today-only technology calculations are generally doomed.

    You can look to the Telecommunications Bubble of 2001 which was powered by a race to dig and install fiber optic cable by the telecoms, niche players and speculators in the early 2000s. The entire business model was based on absurd Internet data exponential growth predictions and that, in general, only one light wave signal could transmit only a single strand of fiber optic cable. That whole business model assumption drove the digging and installation of millions of miles of soon to be unneeded "dark" fiber optic cable, under sea cables, and cables on the right-of-way of train tracks and other areas that didn't even have a termination connection. The industry spent billions in infrastructure planning for a future that would not be needed for decades to come. Wavelength-division multiplexing gave existing fiber cables 100 times more capacity by just replacing transmission and receiver equipment thereby blowing the original fiber optic telecom economic model out of the water and destroying several large telecommunication companies in the process.

Thus spake the master programmer: "Time for you to leave." -- Geoffrey James, "The Tao of Programming"

Working...