45 Teraflops: Intel Unveils Details of Its 100-Billion Transistor AI Chip (siliconangle.com) 16
At its annual Architecture Day semiconductor event Thursday, Intel revealed new details about its powerful Ponte Vecchio chip for data centers, reports SiliconANGLE:
Intel is looking to take on Nvidia Corp. in the AI silicon market with Ponte Vecchio, which the company describes as its most complex system-on-chip or SOC to date. Ponte Vecchio features some 100 billion transistors, nearly twice as many as Nvidia's flagship A100 data center graphics processing unit. The chip's 100 billion transistors are divided among no fewer than 47 individual processing modules made using five different manufacturing processes. Normally, an SOC's processing modules are arranged side by side in a flat two-dimensional design. Ponte Vecchio, however, stacks the modules on one another in a vertical, three-dimensional structure created using Intel's Foveros technology.
The bulk of Ponte Vecchio's processing power comes from a set of modules aptly called the Compute Tiles. Each Compute Tile has eight Xe cores, GPU cores specifically optimized to run AI workloads. Every Xe core, in turn, consists of eight vector engines and eight matrix engines, processing modules specifically built to run the narrow set of mathematical operations that AI models use to turn data into insights... Intel shared early performance data about the chip in conjunction with the release of the technical details. According to the company, early Ponte Vecchio silicon has demonstrated performance of more than 45 teraflops, or about 45 trillion operations per second.
The article adds that it achieved those speeds while processing 32-bit single-precision floating-point values floating point values — and that at least one customer has already signed up to use Ponte Vecchio. The Argonne National Laboratory will include Ponte Vecchio chips in its upcoming $500 million Aurora supercomputer. Aurora will provide one exaflop of performance when it becomes fully operational, the equivalent of a quintillion calculations per second.
The bulk of Ponte Vecchio's processing power comes from a set of modules aptly called the Compute Tiles. Each Compute Tile has eight Xe cores, GPU cores specifically optimized to run AI workloads. Every Xe core, in turn, consists of eight vector engines and eight matrix engines, processing modules specifically built to run the narrow set of mathematical operations that AI models use to turn data into insights... Intel shared early performance data about the chip in conjunction with the release of the technical details. According to the company, early Ponte Vecchio silicon has demonstrated performance of more than 45 teraflops, or about 45 trillion operations per second.
The article adds that it achieved those speeds while processing 32-bit single-precision floating-point values floating point values — and that at least one customer has already signed up to use Ponte Vecchio. The Argonne National Laboratory will include Ponte Vecchio chips in its upcoming $500 million Aurora supercomputer. Aurora will provide one exaflop of performance when it becomes fully operational, the equivalent of a quintillion calculations per second.
not sure it's clear (Score:3)
Re: (Score:2)
No it's not, it's perfectly simple. Let's say you're creating a new device that operates in frobozz. So you find the source that gives you the lowest possible frobozz rating compared to the highest possible teraflop rating, and then apply that as your conversion factor. QED.
Then you have to scale it back a bit when you realise that your magic dingus is now being rated in yottaflops with the conversion factor you've chosen to apply.
Re:not sure it's clear (Score:5, Informative)
Apparently the 45 teraflops is comparable to the 19.5 teraflops number, counting a generic multiply-accumulate as two operations. https://www.igamesnews.com/pc/... [igamesnews.com] quotes Intel as claiming a petaflop for one Pointe Vecchio module when using matrix/tensor units.
And then they put it to work in the crypro mines (Score:1)
Imagine a Beowulf cluster of these (Score:5, Funny)
:)
Re: (Score:1)
Re: (Score:1)
The more important question: ;)
Can it simulate itself in Minecraft?
How does it compare to Tesla's Dojo architecture? (Score:2)
This is the really interesting question, i.e. have Tesla as a total newcomer in the area managed to upstage Intel? Each Tesla D1 chip is supposed to deliver 22.6 TF of FP32 operations, which would be pretty much 50% of what Intel is claiming here, but the real question is how much power does it take to do so, and how many can you pack together and keep cooled. There is of course also the question of when it will become available in volume.
Terje
Re: How does it compare to Tesla's Dojo architectu (Score:1)
Re: (Score:2)
Comparing an Intel announcement to an announcement by Tesla is like comparing the Ford electric truck to the imaginary Tesla Cybertruck
You realize Ford's electric truck is a paper launch, right? It doesn't exist as anything other than a prototype either, same as Tesla's Cybertruck. Those two are exactly comparable.
It remains to be seen how well Aurora stacks up against Dojo. There are differences in both chips and the networking. Both use proprietary transports for off-board communication. Tesla uses something they didn't even bother to name, but its throughput is outrageous. Aurora uses Slingshot [anl.gov], a proprietary networking transport
Re: (Score:1)
You realize Ford's electric truck is a paper launch, right? It doesn't exist as anything other than a prototype either, same as Tesla's Cybertruck. Those two are exactly comparable.
True, I was referrring to the track record of Tesla making announcements which are never realized, from unattended cross country to solar tiles to Semi Trailers which defy the laws of physics etc. I don't think Ford is in the same class.
Re: (Score:1)
No, really it's like comparing one notorious liar that actually delivered stuff, ever if not as great as claimed,
to its younger, more extreme brother, who's always exaggerating so much, that it's become a running joke.
We've come a long way (Score:3)
But.. (Score:2)
Can it be trained to run Crysis?