Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Technology

'AI Ambition is Pushing Copper To Its Breaking Point' (theregister.com) 15

An anonymous reader shares a report: Datacenters have been trending toward denser, more power-hungry systems for years. In case you missed it, 19-inch racks are now pushing power demands beyond 120 kilowatts in high-density configurations, with many making the switch to direct liquid cooling to tame the heat. Much of this trend has been driven by a need to support ever larger AI models.

According to researchers at Fujitsu, the number of parameters in AI systems is growing 32-fold approximately every three years. To support these models, chip designers like Nvidia use extremely high-speed interconnects -- on the order of 1.8 terabytes a second -- to make eight or more GPUs look and behave like a single device.

The problem though, is that the faster you shuffle data across a wire, the shorter the distance at which the signal can be maintained. At those speeds, you're limited to about a meter or two over copper cables. The alternative is to use optics, which can maintain a signal over a much larger distance. In fact, optics are already employed in many rack-to-rack scale-out fabrics like those used in AI model training. Unfortunately, in their current form, pluggable optics aren't particularly efficient or particularly fast.

Earlier in 2024 at GTC, Nvidia CEO Jensen Huang said that if the company had used optics as opposed to copper to stitch together the 72 GPUs that make up its NVL72 rack systems, it would have required an additional 20 kilowatts of power.

'AI Ambition is Pushing Copper To Its Breaking Point'

Comments Filter:
  • Speed of light (Score:4, Informative)

    by bradley13 ( 1118935 ) on Friday November 29, 2024 @06:44AM (#64979039) Homepage
    It all gets crazier, when you remember the speed of light, and that an electrical signal only travels at about 70% of that. If you have, say, a 5GHz signal, that means that in one complete period (which takes 1 / 5*10 second), the signal will only travel about 4cm. Assuming you are trying to get a bunch of separate components to work in sync, well, that's not going to work. Work in parallel, maybe, but never synchronously.
    • Ah, you gotta love Slashdot. I really did have an exponent in the equation, and it even showed in the preview, but in the final comment? Gone...
      • This issue showed up in the first super computers. There are physical limits to how fast you can transfer data and how much energy you can push into an area without everything melting. Still the human brain functions on relatively slow connections and without excessive heat generation, so there must be a way to do it. It probably just is not a LLM.
        • I suspect the human brain uses something similar to a LLM because of the similar things it does. But it also does stuff the LLMs don't, and the wetware is very different from hardware. Instead of picturing a 4096 node transputer network or even a million 4004s which is probably closer, imagine 80+ billion nodes each capable of storing and processing a bare handful of floats but each with connections not just to immediate neighbors [nih.gov] in every direction but also to other nearby neighbors on the other side of th

        • > relatively slow connections

          The neuronal connections are slow but we're just now beginning to measure the terahertz waves coming off the microtubules in the neurons themselves which appear to have tubulin crystals that oscillate and send information through entanglement across the brain network.

          Like in chips we can imagine a power-side bus and a data-side bus and both are valuable.

          Both anaestheics and psychedelics are shown to dampen or modulate the resonance of these crystals in the Layer-5 pyramidal n

    • I've made the following comment numerous times to my teams when waiting for software or patches or updates to install: Considering this is moving at the speed of light, it's taking an awfully long time to finish.

      However, since the signal is moving at only 70% of the speed of light, it makes sense why it takes so long. After all, its only moving at 130K/kps rather than the standard 186K/kps.

      For those who are wondering, yes, I am being sarcastically facetious.

    • by Entrope ( 68843 )

      Clock distribution networks are all about making sure the clock edges stay aligned enough across the circuit. ("Enough" normally means within the tolerance of setup and hold times.)

      Are people still trying to build self-clocking data flow processors, or is the overhead too high compared to traditional clocked designs?

    • by rossdee ( 243626 )

      David Gerrold in "when Harlie was 1" mentioned this in respect to the Graphic Omniscient Device

  • With 20kW for 72 devices, it would come down to an extra 270 watt per device.
    Only a hefty CO2 laser can deliver that amount of power.
    Not to forget that copper interconnects don't go from chip to chip directly in between racks.

  • by evanh ( 627108 ) on Friday November 29, 2024 @07:59AM (#64979087)

    Fundamentally, the total power required is already stupid high with nothing to show. Spreading it out isn't any sort of fix. All you're doing is upping the total power needs even further.

  • >"Unfortunately, in their current form, pluggable optics aren't particularly efficient or particularly fast."

    It is every bit as fast as anything you can do with copper, and potentially very much faster (and not because of the "speed of light" but due to the properties of light, like lack of EMI). As far as power efficient, I would say that is mostly because of distance. Fiber modules, in the current form, are designed to push a readable signal for long (hundreds of meters) or very long distances (kilom

  • "Much of this trend has been driven by a need to support ever larger AI models."

    What need? It's a want, not a need, and it's the direction because tech bros can't think of anything else.

    Just because "the number of parameters in AI systems is growing 32-fold approximately every three years" doesn't mean it should or that it will continue to. One good reason is to avoid "pushing copper to its breaking point". Hell, even Intel recognizes this 25 years ago. There comes a time when turning the knob up stops

    • by gweihir ( 88907 )

      What need? It's a want, not a need, and it's the direction because tech bros can't think of anything else.

      Exactly. But these assholes always like to pretend what they are doing is critical and will save the world. In this case, very obviously not so.

      That said, there seems to be some value in small, specialized LLMs.

  • Does the human race not have any real problems to solve?

  • They better figure out efficient opamps and transducers soon because the Green New Deal needs all of the world's copper production through 2169 by 2030.

    Scratch that - that estimate was before environmentalists shut down the world's largest copper mine in Guatemala.

    Copper is going to get crazy expensive if those regulations are rammed through. Many economists will have black eyes along the way. And Jaguar shareholders.

    I thought the photonics chaps had something close to 1:1 electron/photon transduction a fe

Good day to avoid cops. Crawl to work.

Working...