'AI Ambition is Pushing Copper To Its Breaking Point' (theregister.com) 15
An anonymous reader shares a report: Datacenters have been trending toward denser, more power-hungry systems for years. In case you missed it, 19-inch racks are now pushing power demands beyond 120 kilowatts in high-density configurations, with many making the switch to direct liquid cooling to tame the heat. Much of this trend has been driven by a need to support ever larger AI models.
According to researchers at Fujitsu, the number of parameters in AI systems is growing 32-fold approximately every three years. To support these models, chip designers like Nvidia use extremely high-speed interconnects -- on the order of 1.8 terabytes a second -- to make eight or more GPUs look and behave like a single device.
The problem though, is that the faster you shuffle data across a wire, the shorter the distance at which the signal can be maintained. At those speeds, you're limited to about a meter or two over copper cables. The alternative is to use optics, which can maintain a signal over a much larger distance. In fact, optics are already employed in many rack-to-rack scale-out fabrics like those used in AI model training. Unfortunately, in their current form, pluggable optics aren't particularly efficient or particularly fast.
Earlier in 2024 at GTC, Nvidia CEO Jensen Huang said that if the company had used optics as opposed to copper to stitch together the 72 GPUs that make up its NVL72 rack systems, it would have required an additional 20 kilowatts of power.
According to researchers at Fujitsu, the number of parameters in AI systems is growing 32-fold approximately every three years. To support these models, chip designers like Nvidia use extremely high-speed interconnects -- on the order of 1.8 terabytes a second -- to make eight or more GPUs look and behave like a single device.
The problem though, is that the faster you shuffle data across a wire, the shorter the distance at which the signal can be maintained. At those speeds, you're limited to about a meter or two over copper cables. The alternative is to use optics, which can maintain a signal over a much larger distance. In fact, optics are already employed in many rack-to-rack scale-out fabrics like those used in AI model training. Unfortunately, in their current form, pluggable optics aren't particularly efficient or particularly fast.
Earlier in 2024 at GTC, Nvidia CEO Jensen Huang said that if the company had used optics as opposed to copper to stitch together the 72 GPUs that make up its NVL72 rack systems, it would have required an additional 20 kilowatts of power.
Speed of light (Score:4, Informative)
Re: (Score:3)
Re: (Score:2)
Re: (Score:3)
I suspect the human brain uses something similar to a LLM because of the similar things it does. But it also does stuff the LLMs don't, and the wetware is very different from hardware. Instead of picturing a 4096 node transputer network or even a million 4004s which is probably closer, imagine 80+ billion nodes each capable of storing and processing a bare handful of floats but each with connections not just to immediate neighbors [nih.gov] in every direction but also to other nearby neighbors on the other side of th
Re: (Score:2)
> relatively slow connections
The neuronal connections are slow but we're just now beginning to measure the terahertz waves coming off the microtubules in the neurons themselves which appear to have tubulin crystals that oscillate and send information through entanglement across the brain network.
Like in chips we can imagine a power-side bus and a data-side bus and both are valuable.
Both anaestheics and psychedelics are shown to dampen or modulate the resonance of these crystals in the Layer-5 pyramidal n
Re: (Score:2)
I've made the following comment numerous times to my teams when waiting for software or patches or updates to install: Considering this is moving at the speed of light, it's taking an awfully long time to finish.
However, since the signal is moving at only 70% of the speed of light, it makes sense why it takes so long. After all, its only moving at 130K/kps rather than the standard 186K/kps.
For those who are wondering, yes, I am being sarcastically facetious.
Re: (Score:2)
Clock distribution networks are all about making sure the clock edges stay aligned enough across the circuit. ("Enough" normally means within the tolerance of setup and hold times.)
Are people still trying to build self-clocking data flow processors, or is the overhead too high compared to traditional clocked designs?
Re: (Score:2)
David Gerrold in "when Harlie was 1" mentioned this in respect to the Graphic Omniscient Device
The BS is strong in this one, Luke (Score:2)
With 20kW for 72 devices, it would come down to an extra 270 watt per device.
Only a hefty CO2 laser can deliver that amount of power.
Not to forget that copper interconnects don't go from chip to chip directly in between racks.
Re: (Score:2)
Well, the liar-in-chief at Nvidia obviously cannot do math.
They're way into la-la land (Score:3)
Fundamentally, the total power required is already stupid high with nothing to show. Spreading it out isn't any sort of fix. All you're doing is upping the total power needs even further.
speed vs. power vs. distance (Score:2)
>"Unfortunately, in their current form, pluggable optics aren't particularly efficient or particularly fast."
It is every bit as fast as anything you can do with copper, and potentially very much faster (and not because of the "speed of light" but due to the properties of light, like lack of EMI). As far as power efficient, I would say that is mostly because of distance. Fiber modules, in the current form, are designed to push a readable signal for long (hundreds of meters) or very long distances (kilom
What need to support larger AI models? (Score:2)
"Much of this trend has been driven by a need to support ever larger AI models."
What need? It's a want, not a need, and it's the direction because tech bros can't think of anything else.
Just because "the number of parameters in AI systems is growing 32-fold approximately every three years" doesn't mean it should or that it will continue to. One good reason is to avoid "pushing copper to its breaking point". Hell, even Intel recognizes this 25 years ago. There comes a time when turning the knob up stops
Re: (Score:2)
What need? It's a want, not a need, and it's the direction because tech bros can't think of anything else.
Exactly. But these assholes always like to pretend what they are doing is critical and will save the world. In this case, very obviously not so.
That said, there seems to be some value in small, specialized LLMs.
And all for "better crap" (Score:2)
Does the human race not have any real problems to solve?
Green New Deal (Score:2)
They better figure out efficient opamps and transducers soon because the Green New Deal needs all of the world's copper production through 2169 by 2030.
Scratch that - that estimate was before environmentalists shut down the world's largest copper mine in Guatemala.
Copper is going to get crazy expensive if those regulations are rammed through. Many economists will have black eyes along the way. And Jaguar shareholders.
I thought the photonics chaps had something close to 1:1 electron/photon transduction a fe