Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs (pcworld.com) 33
Four years ago, Google was faced with a conundrum: if all its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centers just to handle all of the requests to the machine learning system powering those services, reads a PCWorld article, which talks about how Tensor Processing Unit (TPU), a chip that is designed to accelerate the inference stage of deep neural networks came into being. The article shares an update: Google published a paper on Wednesday laying out the performance gains the company saw over comparable CPUs and GPUs, both in terms of raw power and the performance per watt of power consumed. A TPU was on average 15 to 30 times faster at the machine learning inference tasks tested than a comparable server-class Intel Haswell CPU or Nvidia K80 GPU. Importantly, the performance per watt of the TPU was 25 to 80 times better than what Google found with the CPU and GPU.
I for one,
Re: (Score:2)
No point saying it, dude. They already know.
A purpose built chip
outperforms general purpose chips?
Wow.
Re: A purpose built chip (Score:1)
My thoughts exactly. How is it even the least but surprising that custom silicon is better than general purpose
Re: (Score:2)
While there is some truth to that, this "purpose built" chip's purpose is to run an open-source AI language. So this is more interesting than a typical custom ASIC.
Re: (Score:2)
How so?
Custom chip built for how the bits are handled is
.... typical of an ASIC.
Re: A purpose built chip (Score:1)
I suppose the application is *slightly* more broad than a typical ASIC but yea, looks like some marketing article.
Re: (Score:2)
ASICs everywhere are embarrassed by how slow the TPUs are.
Wait you mean an ASIC is fast? Why I never!
Man is this a "duh" moment. Purpose built ASICs are extremely fast and low power for what they accomplish. That's why we use them. Look at a small desktop network switch: Little tiny processor that can pass 16gb/sec of traffic around. try and put 8 NICs in a computer and have it switch traffic and you'll be amazed at how much power you need. The reason the switch is small is it is purpose built: It's ASIC does nothing but switch Ethernet packets.
Same deal with some thing on a CPU. You find that decoding an AVC video stream takes next to no CPU power on modern CPUs, yet decoding an MPEG-2 video takes some. Why? Because they have a small bit of dedicated logic for AVC decoding (usually some other formats too). It is low power because it is dedicated.
Always the question in designing a system is flexibility and unit cost vs fixed function and up front cost. A CPU is great because it can do anything, and you can just buy them straight out, tons of companies have them available for purchase right now. However they take a lot of silicon and power to perform a given task. An ASIC takes a bunch of up front money to design and do a manufacturing run, but is very small and efficient, however it can't be reconfigured to do anything else and needs a full respin. In the middle there is something like an FPGA. Which one is right for a application just depends on the balance of a lot of factors.
Re: (Score:2)
An ASIC takes a bunch of up front money to design and do a manufacturing run, but is very small and efficient, however it can't be reconfigured to do anything else and needs a full respin.
Per TFA, the chips they designed are flexible enough to apply to new machine learning models. I think the point is that this was a space ripe for customized architecture, like graphics cards were 15-20 years ago.
Re: (Score:2)
There is a reason that all of the Bitcoin miners are ASIC based now.
Don't expect those machines do be able to do anything else though if bitcoin dies off.
Re: (Score:2)
Purpose built ASICs are extremely fast and low power for what they accomplish
And they have very specific algorithms. Certainly nothing traditionally resembling "machine learning".
Say what...
Re: (Score:3)
Performance bottleneck (Score:1)
if all its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centers
The performance bottleneck in machine learning is training the system and the amount of training data, not the number of users running the model. Not sure I understand how usage is so directly proportional to computing costs.
15-30x the speed
But 1000x as expensive?
Re: (Score:2)
Energy cost is lower, and those will be dominant over longer term.