Speculation Grows As AMD Files Patent for GPU Design (hothardware.com) 39
Long-time Slashdot reader UnknowingFool writes:
AMD filed a patent on using chiplets for a GPU with hints on why it has waited this long to extend their CPU strategy to GPUs. The latency between chiplets poses more of a performance problem for GPUs, and AMD is attempting to solve the problem with a new interconnect called high bandwidth passive crosslink. This new interconnect will allow each GPU to more effectively communicate with each other and the CPU.
"With NVIDIA working on its own MCM design with Hopper architecture, it's about time that we left monolithic GPU designs in the past and enable truly exponential performance growth," argues Wccftech.
And Hot Hardware delves into the details, calling it a "hybrid CPU-FPGA design that could be enabled by Xilinx tech." While they often aren't as great as CPUs on their own, FPGAs can do a wonderful job accelerating specific tasks... [A]n FPGA in the hands of a capable engineer can offload a wide variety of tasks from a CPU and speed processes along. Intel has talked a big game about integrating Xeons with FPGAs over the last six years, but it hasn't resulted in a single product hitting its lineup. A new patent by AMD, though, could mean that the FPGA newcomer might be ready to make one of its own...
AMD made 20 claims in its patent application, but the gist is that a processor can include one or more execution units that can be programmed to handle different types of custom instruction sets. That's exactly what an FPGA does...
AMD has been working on different ways to speed up AI calculations for years. First the company announced and released the Radeon Impact series of AI accelerators, which were just big headless Radeon graphics processors with custom drivers. The company doubled down on that with the release of the MI60, its first 7-nm GPU ahead of the Radeon RX 5000 series launch, in 2018. A shift to focusing on AI via FPGAs after the Xilinx acquisition makes sense, and we're excited to see what the company comes up with.
"With NVIDIA working on its own MCM design with Hopper architecture, it's about time that we left monolithic GPU designs in the past and enable truly exponential performance growth," argues Wccftech.
And Hot Hardware delves into the details, calling it a "hybrid CPU-FPGA design that could be enabled by Xilinx tech." While they often aren't as great as CPUs on their own, FPGAs can do a wonderful job accelerating specific tasks... [A]n FPGA in the hands of a capable engineer can offload a wide variety of tasks from a CPU and speed processes along. Intel has talked a big game about integrating Xeons with FPGAs over the last six years, but it hasn't resulted in a single product hitting its lineup. A new patent by AMD, though, could mean that the FPGA newcomer might be ready to make one of its own...
AMD made 20 claims in its patent application, but the gist is that a processor can include one or more execution units that can be programmed to handle different types of custom instruction sets. That's exactly what an FPGA does...
AMD has been working on different ways to speed up AI calculations for years. First the company announced and released the Radeon Impact series of AI accelerators, which were just big headless Radeon graphics processors with custom drivers. The company doubled down on that with the release of the MI60, its first 7-nm GPU ahead of the Radeon RX 5000 series launch, in 2018. A shift to focusing on AI via FPGAs after the Xilinx acquisition makes sense, and we're excited to see what the company comes up with.
FPGA as interconnect? (Score:2)
FPGA as interconnect, what a powerful concept. I presume that the intent is to reduce the number of physical traces required to eliminate bottlenecks when the necessary communication paths are hard or impossible to predict (because the data movement paths are a property of code uploaded by the user). So are they doing things like mapping a texture object on one chiplet to a copy of the texture object on another chiplet, so that any changes on one are propagated to the other at local memory controller speed,
Re: (Score:2)
Having second thoughts about that, this idea doesn't eliminate bottlenecks. Brute force solution is just to reduce the cost of having massive numbers of parallel interconnects. Obviously, since it has taken decades to address this problem, it's not going to yield quickly to a bit of slashdot-inspired speculation. I'm tending towards the brute force bandwidth theory, with fpgas somehow involved in amortizing the cost of data routing.
Re: (Score:2)
In other words, absurdly high bandwidth between chiplets isn't the answer, because the cost of that would negate the MCM advantage. But do something to move the knee of the curve that ends in a bottleneck, which somehow eases the burden of a driver that divides up random incoming rendering operations and routes them to chiplets somehow to maximize the chance that data moves between render objects will be in the same chiplet cache. That was always the reason why crossfire was absurdly hard to do efficiently
Re: (Score:2)
Basically, I'm going to have to admit that I'm getting zero insight from the hothardware analysis, or more realistically, lack of it. So AMD has patented some particular form of interposer to increase inter chiplet bandwidth. Does this present a seamless programming model? Somewhat more seamless? Just moves the knee of the bottleneck curve a bit for typical GPU loads and otherwise the driver nightmare remains exactly the same? We want to know. Some of us, anyway.
Re: FPGA as interconnect? (Score:3)
Maybe you should not extrapolate hours of analysis from a single press release blurb.
Wait for the *product* release.
Or probabl preferably, the documentation one
Re: (Score:2)
On the fpga front I'm seeing not a whole lot of there there. The interposer is a bit more interesting but strikes me as very far from a product.
BTW, you had as reason for your content-free post?
Re: (Score:2)
I'm thinking it isn't actually for 3d at all, but for DSP with mixed loads where part of the calculation can be done in parallel.
Re: (Score:2)
There seems to be two different patents in the news, one for an interposer, a possible enabler for chiplet GPU designs, and another one for some FPGA-GPU combination that has everybody scratching their heads and speculating about what, if any, the benefit might be. Conflating these two has created a lot of confusion and little information.
Re: (Score:2)
Looking at the headline and article, it seems like the initial use-case is for AI processing.
AMD has been working on different ways to speed up AI calculations for years.
Re: (Score:2)
For the FPGA, yes. That makes sense, the multiple thousands of cycles latency of incrementally updating the FPGA can be amortized against many uses of the particular configuration.
Basically, there's a bunch of confusion and muddled messaging about both the FPGA and interposer rumours, which somehow got mixed up together. Likely for no good reason. I'm going to have to call this a mess.
Re: (Score:2)
There seems to be two different patents in the news, one ... and another one for some FPGA-GPU combination that has everybody scratching their heads and speculating about what, if any, the benefit might be. Conflating these two has created a lot of confusion and little information.
I didn't scratch my head, because I work with this shit.
You say you don't understand, how would you know if I was "conflating" something?
You're just being a blockhead, that's why you weren't able to hear the explanation.
And... did you really think that it is 1 patent per product? Are you really that stupid these days? See a doctor.
Re: (Score:2)
Fuck off.
Re: (Score:2)
OOohh, impressive rhetoric.
Did you figure out what the words mean yet? No? Say it again. You still don't comprehend.
Re: (Score:2)
My guess would be they are going to be used for stuff like AI and physics. AMD is behind Nvidia on that front but could get ahead if they have more flexible hardware.
Re: (Score:2)
You are most likely right about the FPGAs, some applications can tolerate the huge Iatency of reprogramming a FPGA. Not 3D graphics for the most part, except some special cases. However AMD sells lots of GPUs to the GPGPU market, at prices you would never want to pay for gaming. Could be that's what the FPGA patent is directed at, or even more likely, this is just the normal process of patenting everything in sight whether or not there is a practical use for it.
After spending more time looking into it than
Re: (Score:2)
FPGA as interconnect, what a powerful concept. I presume that the intent is to reduce the number of physical traces required to eliminate bottlenecks when the necessary communication paths are hard or impossible to predict ... Anybody serious hardware hack care to comment?
This is so I can build custom clusters with higher power CPUs, right now I'd be doing the same thing with discrete chips, with the FPGA doing the managed IO, and a bunch of ARM processors. Getting it onto a big fat AMD CPU with a bunch of cores would make those processors really shine in a roll where they're not really well suited already.
This is a huge blast across the bow for both Intel and ARM, because the alternative to a custom cluster is to virtualize on top of high end Intel processors.
Re: (Score:2)
Interesting. I'd enjoy any more details and/or speculation you have about that managed IO.
Not convinced FPGAs make sense for mass market (Score:3)
Re: (Score:2)
some of the flexibilty of FPGAs is used in the 5G radio industry, making it practical to track ever changing standards without building out new hardware.
Re: (Score:2)
Re: (Score:2)
That kind of FPGA is already on the market [xilinx.com]. Some front-end elements like ADCs and DDCs might be adaptable to radio astronomy, but other elements (like the hard FEC cores, pre-distortion, and crest factor reduction) are probably too application-specific to be usable in astronomy.
Re: (Score:2)
Re: (Score:2)
They are however much higher power consumption and cost than ASICs. Program development for FPGAs is still a very difficult and specialized field.
Higher power consumption goes away with integration, it isn't any sort of basic requirement. A regular FPGA has powerful IO, for flexibility, but that might not be the case here. Also all sorts of debugging support that might not be in this.
And programming is popular with hobbyists these days, the tools are very accessible and varied.
Re: (Score:2)
Re: (Score:2)
If you understood what I wrote, you'd understand that the higher power usage is because of the features desired in the product, not because programmable logic controllers (the general class) require more power. You're talking about the specific products you use; that doesn't address what I said.
An ASIC will not "always be better" than an FPGA based on the technology; they'll be better based on the use cases the FPGAs on the market are intended for.
The FPGA in the story is not for those use cases, and doesn'
Re: (Score:2)
The big thing is that if you do things the same way, but looking to do it faster, you eventually run out of room to grow/improve. With the merger with Xilinx, AMD is looking at different ways to do things that the majority of people would never have thought about, because they are very familiar with how things are currently done. Combine FPGA with modern CPU design might have synergies at the very low end that will blow our minds when we see them.
Now, the article is more about areas that AMD is going in
and enable truly exponential performance growth? (Score:4, Interesting)
The interconnect fabric doesn't scale exponentially. You can do a lot with good software to schedule work between the chiplets in an efficient manner. Sadly, most GPU workloads are not shared-nothing and hyper-parallel, you end up having to synchronize between shader cores and finally between other chiplets (which is expensive). Still, it's progress and chiplet GPUs are inevitable for more than just AMD and NV [anandtech.com]. At least unless yields improves on larger silicon (471 mm^2 at 12nm is where NV was last I checked, and that's a heckin' chonker for any silicon design).
Re: and enable truly exponential performance growt (Score:2)
Chiplets exist precosely to get high yields from huge chips, because you only have to throw out the defective chiplets, not the entire chip. This is where Intel seriously missed the boat.
Re: (Score:2)
I'm seriously stumped by Intel's dabbling in the GPU arena. My employer has to send our designs to Taiwan to be made, we don't have our own fab. Yet for the many years I've worked here, we've been beating Intel at graphics. And we're a significantly smaller company. Intel has the resources to hire world class engineers, and they have better infrastructure than my company, so I can only conclude that Intel's management doesn't want to take the GPU market seriously because they haven't gotten even close to pu
Re: (Score:2)
Intel has a "everything is a CPU core" mentality that has screwed them time and again in the GPU arena, most notably with Larrabee [wikipedia.org] which was basically a bunch of P54C cores strung together with some magical interconnect. Predictably it turned out to not be performant and also to be too expensive.
Re: (Score:2)
Larrabee wasn't a total failure. Xeon Phi played a solid third place in compute for 10 years, and second in some niche industries. It had poor throughput, but for some work loads that performed poorly on a GPU you could sometimes get at least middling performance with a Phi. I think the CPU-based architecture makes it a jack of all trades and master of none. Which is fine for general purpose computing, but it's the wrong trade-off in compute where you invest millions of dollars in specialized hardware for y
Re: (Score:2)
You can do a lot with good software to schedule work between the chiplets in an efficient manner. Sadly, most GPU workloads are not shared-nothing and hyper-parallel, you end up having to synchronize between shader cores and finally between other chiplets (which is expensive).
That's what I'm thinking this is for; so that you can program your interconnects to synchronize in the most efficient way for your algorithm. Eg, each chip replaces an ARM cluster that had microcontrollers as interconnects.
It will bring the capabilities of cluster computing to small labs.
Re: (Score:2)
If there is a way to apply a "scheduler" approach to a GPU where shaders can be assigned to one module or another(so they don't need to talk between the modules, then that will help. There is a reason why MCM hasn't taken off for graphics just yet, but if AMD is putting ideas out there, that implies that AMD has some interesting ideas.
Oh yeaaahhh! (Score:3)
FPGAs for everyone? ;)
Even for me with the low end APU?
Don't make me wet down there, baybae!
Re: (Score:3)
FPGAs for everyone?
The cheapest FPGA in my IC bin cost under $6, and is a few years old.
And hint: I've never once purchased a commercial compiler or IDE, and I don't use any at all in my work flow.