Early Reports Indicate Nvidia DGX Spark May Be Suffering From Thermal Issues (tomshardware.com) 15
Longtime Slashdot reader zuki writes: According to a recent report over at Tom's Hardware, a number of those among early buyers who have been able to put the highly-coveted $4,000.00 DGX Spark mini-AI workstation through its paces are reporting throttling at 100W (rather than the advertised 240W capacity), spontaneous reboots, and thermal issues under sustained load. The workstation came under fire after John Carmack, the former CTO of Oculus VR, began raising questions about its real-world performance and power draw. "His comments were enough to draw tech support from Framework and even AMD, with the offer of an AMD-driven Strix Halo-powered alternative," reports Tom's Hardware.
"What's causing this suboptimal performance, such as a firmware-level cap or thermal throttling, is not clear," the report adds. "Nvidia hasn't commented publicly on Carmack's post or user-reported instability. Meanwhile, several threads on Nvidia's developer forums now include reports of GPU crashes and unexpected shutdowns under sustained load."
"What's causing this suboptimal performance, such as a firmware-level cap or thermal throttling, is not clear," the report adds. "Nvidia hasn't commented publicly on Carmack's post or user-reported instability. Meanwhile, several threads on Nvidia's developer forums now include reports of GPU crashes and unexpected shutdowns under sustained load."
Appleitis (Score:4, Insightful)
First it was the 12vhpwr kerfuffle, too small to handle the power. Now its this thing, again too small to handle the power.
Nvidia seems to have caught a case of Appleitis, and is pushing style over engineering. Funny thing is, they absolutely have no need to do that. Their stuff could look like what the cat threw up, and they would still sell like hot cakes.
Re: (Score:2)
Agreed. My interest in this product has nothing whatsoever to do with its small size. It could be 10x larger and I'd still want one.
Re: (Score:2)
It's about saving a buck and not changing to better connectors. If devices are really going to start taking tens of amps at 12 volts then the bus voltage should be increased to 48. Those cheap little Molex pins are maxed out and it doesn't take much resistance to cause thermal runaway.
Re:Pfff, my 2009 iMac can run at 212F/100C (Score:5, Interesting)
A lot of people misunderstand the market for the DGX Spark.
If you want to run a small model at home, or create a LoRA for a tiny model, you don't want to do it on this - you want to do it on gaming GPUs.
If you want to create a large foundation model, or run commercial inference, you don't want to do it on this - you want to do this on high-end AI servers.
This fits the middle ground between these two things. It gives you a far larger memory than you can get on gaming GPUs (allowing you to do inference on / tune / train much larger models, esp. when you combine two Sparks). It sacrifices some memory bandwidth and FLOPs and costs somewhat more, but it lets you do things that you simply can't do in any meaningful way on gaming GPUs, that you'd normally have to buy / rent big expensive servers to do.
The closest current alternative is Mac Studio M2 or M3 Ultras. You get better bandwidth on the macs, but way worse TOPS. The balance of these factors depends greatly on what sort of application you're running, but in most cases they'll be in the ballpark of each other. For example, one $7,5k Mac M3 Ultra with 256GB is said to run Qwen 3 235B GGUF at 16 tok/s, while two linked $4,2k DGX Sparks with the same total 256GB are said to do it at 12 tok/s, with similar quantization. Your mileage may vary depending on what you're doing.
Either way, you're not going to be training a big foundation model or serving commercial inference on either, at least not economically. But if you want something that can work with large models at home, these are the sort of solutions that you want. The Spark is the sort of system that you train your toy and small models on before renting out a cluster for a YOLO run, or to run inference a large open model for your personal or office internal use.
Re: (Score:3)
Never hedge on prospects (Score:1)
Suffering from Apple-itis (Score:5, Informative)
Trying to shove the hardware into too-small an enclosure without proper cooling just so it can be in a fancy small hipster case. Put it in a proper case with a proper cooling system and there won't be a problem.
Re: (Score:2)
Jensen is so focused on generating hype, he doesn't care too much about practical matters. The whole, "With AI, you don't need a CPU" nonsense that Jensen tried pushing for example.
Carmack (Score:5, Informative)
John Carmack, the former CTO of Oculus
That is how you introduce John Carmack? He is so, so much more than "the former CTO of Oculus", which make him sound like just another a C-suite, VC-bro floozy. It would be better, especially in this context, to call him the co-founder of id Software - the author of Doom. For Quake III, he helped implement an ingenious hack for computing 1/sqrt(x) [youtube.com] about 4x faster than a typical floating-point computation. He also developed an efficient algorithm for rendering shadows of 3D objects (Carmack's reverse, or Z-fail), which is still in use today [nvidia.com].
It's fair to say Carmack's forgotten more about computing and performance than most people will ever know.
Re: (Score:2)
For Quake III, he helped implement an ingenious hack for computing 1/sqrt(x) [youtube.com] about 4x faster than a typical floating-point computation.
It never ceases to amaze me how many old wives tales to idolize certain people persist. Quake III code’s was obfuscated perhaps to peacock it but there was also great deal of prior art to this method. https://www.beyond3d.com/conte... [beyond3d.com]
Re: (Score:3)
For Quake III, he helped implement an ingenious hack for computing 1/sqrt(x)
I don't believe Carmack has ever tried to claim credit for that one. It was Kahan, the main person behind IEEE754 floating point.
Can it be clocked down a bit, (Score:1)
or is it hardwired to one speed?
I'd be worried too (Score:2)
Re: (Score:2)