Google Unveils Two New AI Chips For the 'Agentic Era' (cnbc.com) 18
Google announced two new tensor processing units (TPUs) for the "agentic era," with separate processors dedicated to training and inference. "With the rise of AI agents, we determined the community would benefit from chips individually specialized to the needs of training and serving," Amin Vahdat, a Google senior vice president and chief technologist for AI and infrastructure, said in a blog post. Both chips will become available later this year. CNBC reports: After years of producing chips that can both train artificial intelligence models and handle inference work, Google is separating those tasks into distinct processors, its latest effort to take on Nvidia in AI hardware. [...] None of the tech giants are displacing Nvidia, and Google isn't even comparing the performance of its new chips with those from the AI chip leader. Google did say the training chip enables 2.8 times the performance of the seventh-generation Ironwood TPU, announced in November, for the same price, while performance is 80% better for the inference processor.
Nvidia said its upcoming Groq 3 LPU hardware will draw on large quantities of static random-access memory, or SRAM, which is used by Cerebras, an AI chipmaker that filed to go public earlier this month. Google's new inference chip, dubbed TPU 8i, also relies on SRAM. Each chip contains 384 megabytes of SRAM, triple the amount in Ironwood. The architecture is designed "to deliver the massive throughput and low latency needed to concurrently run millions of agents cost-effectively," Sundar Pichai, CEO of Google parent Alphabet, wrote in a blog post.
Nvidia said its upcoming Groq 3 LPU hardware will draw on large quantities of static random-access memory, or SRAM, which is used by Cerebras, an AI chipmaker that filed to go public earlier this month. Google's new inference chip, dubbed TPU 8i, also relies on SRAM. Each chip contains 384 megabytes of SRAM, triple the amount in Ironwood. The architecture is designed "to deliver the massive throughput and low latency needed to concurrently run millions of agents cost-effectively," Sundar Pichai, CEO of Google parent Alphabet, wrote in a blog post.
Re: (Score:2, Interesting)
ah, not available, except to rent a time slice? fuck off, then.
In the context of building ML models, renting a virtual farm, may be the better solution. You think think the GPU upgrade cycle is bad, wait until you try to keep up with AI level products. :-)
Accelerating the ML model on your PC or Mac is a very different thing.
Re: (Score:2)
Re: (Score:3)
Apple doesn't have devices with enough RAM to challenge Nvidia.
Apple also has no credibility in servers, after they got into them, then left, then got into them again, then left again. Nobody wants to be rugpulled.
Re: (Score:2)
Re: (Score:1)
Apple doesn't have devices with enough RAM to challenge Nvidia.
Again, I am referring to the local ML model execution, acceleration. Apple Watches have done impressive on-board processing using ML models.
also has no credibility in servers, ...
Not what I referred to.
Re: (Score:2)
The vast majority of LLM processing is done in the cloud and any AMD laptop has the functionality to run LLMs, plus probably expandable memory so if you can afford the RAM, you can run larger models than with Apple. Nobody cares yet. Maybe eventually.
Re: (Score:2)
Apple doesn't have devices with enough RAM to challenge Nvidia.
Again, I am referring to the local ML model execution, acceleration. Apple Watches have done impressive on-board processing using ML models.
AI/ML encompasses a wide range of use cases. The Apple use cases are generally less demanding client tasks that just have to work and be good enough. Those use cases are very different from what the Nvidia GPUs and Google TPUs are addressing.
Re: (Score:2)
The compute performance of Apple Silicon is vastly inferior to a mid-range discrete. Its bandwidth isn't great in comparison, either.
So, in terms of GB-of-VRAM-to-GB-of-VRAM, Apple Silicon is worst than any discrete you're likely to have for ML purposes.
However, they've got something you can't get on a discrete- 128GB of VRAM in a laptop, and 512GB of VRAM in a desktop.
This changes the equation, because it means your Ap
NVidia + Google + Cerebras moving to SRAM (Score:4, Insightful)
SRAM has never been built at this scale, afaik. Cerebras was ahead of the curve here, building wafer scale SRAMs years ago. The penalties of DRAM (even with HBM) are now so severe that everyone is taking the gloves off and building mighty SRAMs. This has always been possible in theory, but the high cost never justified it.
The impact on semiconductor fab demand is significant. SRAM cells are larger than DRAM bits: more silicon die area for the same amount of RAM.
Also, the training vs. inference split Google is baking into actual hardware is a big deal: it's the reality that training and inference are very distinct things asserting itself, which has been obvious to anyone that hasn't been drinking excessive NVidia cool-aid: there is a future where costly, general purpose GPU-like devices aren't actually necessary for operating LLMs.
Re: (Score:3)
Also, the training vs. inference split Google is baking into actual hardware is a big deal: it's the reality that training and inference are very distinct things asserting itself, which has been obvious to anyone that hasn't been drinking excessive NVidia cool-aid: there is a future where costly, general purpose GPU-like devices aren't actually necessary for operating LLMs.
The training/inference requirements are indeed different in significant ways. If the TPU 8i had been available a few years ago, it would have seriously affected Nvidia sales. However, the 8i is just now in the process of becoming available and is also not a small or cheap module. It's also being introduced at the same time that Nvidia is introducing its own inference devices. Due to the market timing and other factors, it remains to be seen how much it affects Nvidia's market.
Just wait... (Score:2)
Just wait until the bubble bursts, and everyone starts removing anything 'AI' from their devices (as much as possible), and stops using it because they're sick of the hallucinations or how it's baked-into everything... I'll keep using my Galaxy S9 (and Win10 Enterprise LTSC, and not-smart 18 year old Plasma TV) until it totally dies (only used Bixby like 5 times... mostly just seeing if it's worth using).
I don't need "Clod" to generate my Arduino code for me... I'll look up stuff on my own. I can type out
Two chips? Training for me but not for thee. (Score:2)