Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs (pcworld.com) 91

Posted by msmash on Wednesday April 05, 2017 @03:20PM from the pushing-the-boundaries dept.

Four years ago, Google was faced with a conundrum: if all its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centers just to handle all of the requests to the machine learning system powering those services, reads a PCWorld article, which talks about how Tensor Processing Unit (TPU), a chip that is designed to accelerate the inference stage of deep neural networks came into being. The article shares an update: Google published a paper on Wednesday laying out the performance gains the company saw over comparable CPUs and GPUs, both in terms of raw power and the performance per watt of power consumed. A TPU was on average 15 to 30 times faster at the machine learning inference tasks tested than a comparable server-class Intel Haswell CPU or Nvidia K80 GPU. Importantly, the performance per watt of the TPU was 25 to 80 times better than what Google found with the CPU and GPU.

Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 91 Comments Log In/Create an Account

Comments Filter:

I for one, (Score:4, Funny)

by Bodhammer ( 559311 ) writes: on Wednesday April 05, 2017 @03:23PM (#54180597)

Welcome our new Google overlords. (or whatever...)

- Re:I for one, (Score:4, Funny)
  
  by Hognoxious ( 631665 ) writes: on Wednesday April 05, 2017 @04:14PM (#54181081) Homepage Journal
  
  No point saying it, dude. They already know.
  
- Re: (Score:2)
  
  by R3d M3rcury ( 871886 ) writes:
  
  I wonder where they got the idea for a custom machine learning chip... [sideshowcollectors.com]
  Oh. Shit.
A purpose built chip (Score:5, Insightful)

by Anonymous Coward writes: on Wednesday April 05, 2017 @03:28PM (#54180649)

outperforms general purpose chips?
Wow.

- Re: A purpose built chip (Score:1)
  
  by Anonymous Coward writes:
  
  My thoughts exactly. How is it even the least but surprising that custom silicon is better than general purpose
  - Re: (Score:2)
    
    by ArmoredDragon ( 3450605 ) writes:
    
    Not sure why they didn't just call it what it is: ASIC.
    - Re: (Score:2)
      
      by Jayfar ( 630313 ) writes:
      
      Not sure why they didn't just call it what it is: ASIC.
      Well TFU kinda did that: "TPUs are what’s known in chip lingo as an application-specific integrated circuit (ASIC)."
  - Re: (Score:2, Insightful)
    
    by Tough Love ( 215404 ) writes:
    
    Actually, what is really surprising is that Google considered the project worth doing to get only 15-30% advantage vs GPU, if those numbers are accurate. In the best case, this buys roughly an 18 month advantage before GPUs get faster and the engineering has to be done all over again, or the project will just go the way of other Google abandonware. And in that brief window, do saved operating costs justify the sunk engineering and fabrication cost? I doubt it.
    Now, on second look, this smells like a vanity
    - Re: (Score:2)
      
      by religionofpeas ( 4511805 ) writes:
      
      They are 15-30 times faster, not 15-30%. That's a huge difference. And this is only the first version, so it is likely that the TPU can be improved faster than GPUs that have been on the market for years.
      - Re: A purpose built chip (Score:4, Informative)
        
        by Tough Love ( 215404 ) writes: on Thursday April 06, 2017 @08:54AM (#54184545)
        
        They are 15-30 times faster, not 15-30%.
        Every little order of magnitude really helps :)
        
    - Re: (Score:3)
      
      by s_p_oneil ( 795792 ) writes:
      
      I'm sure you've already seen the "it's 15-30 times, not 15-30 percent" replies. There's also the "performance per watt of the TPU was 25 to 80 times better". Can you imagine how much money this can save Google in electricity costs? It's 1-2 orders of magnitude better (10-100 times), with the possibility that they will continue to find dramatic improvements.
      If we equate your assessment with a "bunt", what Google really did is knocked the ball out of the park.
- Re: (Score:2)
  
  by MightyYar ( 622222 ) writes:
  
  While there is some truth to that, this "purpose built" chip's purpose is to run an open-source AI language. So this is more interesting than a typical custom ASIC.
  - Re: (Score:2)
    
    by NatasRevol ( 731260 ) writes:
    
    How so?
    Custom chip built for how the bits are handled is .... typical of an ASIC.
    - Re: A purpose built chip (Score:1)
      
      by Anonymous Coward writes:
      
      I suppose the application is *slightly* more broad than a typical ASIC but yea, looks like some marketing article.
    - Re:A purpose built chip (Score:5, Informative)
      
      by ShanghaiBill ( 739463 ) writes: on Wednesday April 05, 2017 @04:58PM (#54181437)
      
      How so?
      The TPU is a "purpose built" chip, but that purpose is very broad. It is optimized for massively parallel low-precision matrix operations, which is useful not only for neural nets, but also simulation of physical processes like CFD, weather prediction, climate models, computational chemistry, etc. It can do everything a GPU can do except the rasterization and texture mapping, but it can do it faster and with much less power.
      
      - Re:A purpose built chip (Score:5, Informative)
        
        by Baloroth ( 2370816 ) writes: on Wednesday April 05, 2017 @06:12PM (#54181905)
        
        It is optimized for massively parallel low-precision matrix operations, which is useful not only for neural nets, but also simulation of physical processes like CFD, weather prediction, climate models, computational chemistry, etc.
        Maybe, but I doubt it. It's far too low precision, for one thing: 8-bits doesn't get you very far in any of those fields (you typically want at least 32-bit FLOPS for those, and quite often 64-bit precision is required, as numerical errors accumulate exponentially in a chaotic system), and they're really not even big matrices (just 256x256). Really the only place this kind of thing would excel is signal processing, which is basically what they're using them for.
        
    - Re: (Score:2)
      
      by MightyYar ( 622222 ) writes:
      
      By your broad definition, a GPU is also a "typical ASIC".
      - Re: (Score:2)
        
        by David_Hart ( 1184661 ) writes:
        
        By your broad definition, a GPU is also a "typical ASIC".
        Yep... A GPU is basically an ASIC as well. It's programmed to do one thing well. That it can be used for other things that use similar calculations was pure coincidence (i.e. bitcoin mining).
        
        Re: (Score:2)
        
        by MightyYar ( 622222 ) writes:
        
        Agreed that a GPU is an ASIC. I don't think they are "typical" in a few senses, but mainly the crowd around here go crazy over them.
      - Re: (Score:2)
        
        by NatasRevol ( 731260 ) writes:
        
        Well, since it's not a CPU, yes it's an ASIC.
        
        Re: (Score:2)
        
        by MightyYar ( 622222 ) writes:
        
        Not sure where I said otherwise.
- Re: (Score:2)
  
  by NatasRevol ( 731260 ) writes:
  
  ASICs everywhere are embarrassed by how slow the TPUs are.
- Re: A purpose built chip (Score:1)
  
  by Christopher Skinner ( 4818559 ) writes:
  
  GPU chips are a bunch of multipliers. Floating point, which is, of course, integer , with an exponential. The press release talks about "inference" as if this a math function. My guess is Weighted multipliers. ...If Ggle have in fact made more efficient multipliers, then you gamers can turn off those noisy fans. Or they faster? It seemed to be saying they were more efficient.
- Re: (Score:2)
  
  by Tough Love ( 215404 ) writes:
  
  It takes considerable organizational effort to push an ASIC all the way through through the pipe from design to production. Even budgeting and staffing are nontrivial. The technology might not be earth shattering, but the engineering process is respectable. And who knows, the technology might be earth shattering. But probably not. It uses numerical methods, analog would be faster and more interesting.
- Re: (Score:2)
  
  by Pete Smoot ( 4289807 ) writes:
  
  I can't count how many times someone thought they could build an ASIC to optimize some computation, only to get crushed by Moore's Law operating on general purpose CPUs.
  Never fight a land war during a Russian winter. Never bet against Ethernet. Never bet against Moore's Law.
  - Re: (Score:2)
    
    by religionofpeas ( 4511805 ) writes:
    
    It will take a while for Moore's law to catch up with 15-30 times speed improvement, and even better power improvement.
    And Moore's law also helps this chip.
    - Re: (Score:2)
      
      by Pete Smoot ( 4289807 ) writes:
      
      I know, that's what makes this a remarkable achievement. Many times in the past people have tried to do this but it took much longer than they anticipated. In the mean time, Intel, AMD, nVidia, ATI, or whoever managed to catch up and surpass the ASIC. It turned out the performance win from ASICs had a shorter shelf life than people realized.
      It seems times have changed. Google has a very specific workload which appears to be different from what the mainstream processors have optimized for. The easy (easier?)
- Re: (Score:2)
  
  by religionofpeas ( 4511805 ) writes:
  
  It's not that obvious when you're talking about floating point calculations in combination with external memory. A GPU is highly optimized for both of those requirements, and it's not all that simple to make an ASIC that does this better. The main reason Google got such an improvement is because the require much less precision in their results.
Wait you mean an ASIC is fast? Why I never! (Score:5, Informative)

by Sycraft-fu ( 314770 ) writes: on Wednesday April 05, 2017 @03:39PM (#54180723)

Man is this a "duh" moment. Purpose built ASICs are extremely fast and low power for what they accomplish. That's why we use them. Look at a small desktop network switch: Little tiny processor that can pass 16gb/sec of traffic around. try and put 8 NICs in a computer and have it switch traffic and you'll be amazed at how much power you need. The reason the switch is small is it is purpose built: It's ASIC does nothing but switch Ethernet packets.
Same deal with some thing on a CPU. You find that decoding an AVC video stream takes next to no CPU power on modern CPUs, yet decoding an MPEG-2 video takes some. Why? Because they have a small bit of dedicated logic for AVC decoding (usually some other formats too). It is low power because it is dedicated.
Always the question in designing a system is flexibility and unit cost vs fixed function and up front cost. A CPU is great because it can do anything, and you can just buy them straight out, tons of companies have them available for purchase right now. However they take a lot of silicon and power to perform a given task. An ASIC takes a bunch of up front money to design and do a manufacturing run, but is very small and efficient, however it can't be reconfigured to do anything else and needs a full respin. In the middle there is something like an FPGA. Which one is right for a application just depends on the balance of a lot of factors.

- Re:Wait you mean an ASIC is fast? Why I never! (Score:4, Insightful)
  
  by chispito ( 1870390 ) writes: on Wednesday April 05, 2017 @03:54PM (#54180875)
  
  An ASIC takes a bunch of up front money to design and do a manufacturing run, but is very small and efficient, however it can't be reconfigured to do anything else and needs a full respin.
  Per TFA, the chips they designed are flexible enough to apply to new machine learning models. I think the point is that this was a space ripe for customized architecture, like graphics cards were 15-20 years ago.
  
- Re: (Score:3)
  
  by Nukenbar ( 215420 ) writes:
  
  There is a reason that all of the Bitcoin miners are ASIC based now.
  Don't expect those machines do be able to do anything else though if bitcoin dies off.
- Re: (Score:2)
  
  by thegarbz ( 1787294 ) writes:
  
  Purpose built ASICs are extremely fast and low power for what they accomplish
  And they have very specific algorithms. Certainly nothing traditionally resembling "machine learning".
Say what... (Score:2)

by __aaclcg7560 ( 824291 ) writes:

I thought the TPU was for hard drive encryption. Or is it doing double duty?
- Re:Say what... (Score:5, Informative)
  
  by itsownreward ( 688406 ) writes: on Wednesday April 05, 2017 @03:50PM (#54180833)
  
  You're thinking of a TPM [wikipedia.org]. This is a TPU [wikipedia.org].
  
Performance bottleneck (Score:1)

by speedplane ( 552872 ) writes:

if all its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centers
The performance bottleneck in machine learning is training the system and the amount of training data, not the number of users running the model. Not sure I understand how usage is so directly proportional to computing costs.
- - - Re: Performance bottleneck (Score:2)
      
      by aussie_a ( 778472 ) writes:
      
      It's not about training the neural network. It'd about data mining and monetizing the customers. That's why everything Google phones home.
- No, they don't (Score:2)
  
  by fyngyrz ( 762201 ) writes:
  
  Algorithms and processor sets are not artificial intelligence and neural networks
  That's like saying a software defined radio is not a radio.
  It's right -- but it's also completely wrong.
  And the important part in the context here... yeah, the completely wrong part.
  You can create a perfectly fine neural network with a general purpose von Neuman or Harvard architecture CPU. Speed and efficiency are issues, that's all, and that's what the TPU is designed to address.
15-30x the speed (Score:3)

by fred6666 ( 4718031 ) writes: on Wednesday April 05, 2017 @04:05PM (#54180987)

But 1000x as expensive?

- Re: (Score:2)
  
  by religionofpeas ( 4511805 ) writes:
  
  Energy cost is lower, and those will be dominant over longer term.
  - Inherent stratification (Score:2)
    
    by fyngyrz ( 762201 ) writes:
    
    Energy cost is lower, and those will be dominant over longer term.
    This is likely another demonstration of "those who have the money, make more money."
    Solar panels: You can save all kinds of money. If you can afford to install the system in the first place.
    Investments: You can make all kinds of interest. If you have money to invest.
    Toilet paper: You can save lots of money. If you buy it in bunches on sale. But if you can't spare the funds... your TP costs more than the person with a few bucks to spare who bu
- Re: (Score:2)
  
  by bugs2squash ( 1132591 ) writes:
  
  maybe it's a task that is not well suited to the GPU, so it performs little better than general purpose hardware.
Prime directive! Bah! (Score:2)

by kimgkimg ( 957949 ) writes:

Oh good, so our dystopian future can be realized just that much faster then...
amazing! (Score:2)

by tommeke100 ( 755660 ) writes:

but how many fps does it get running the new Mass Effect? Oh it can't?
Machine language example? (Score:1)

by Tablizer ( 95088 ) writes:

What does the machine language for these things look like? Does anybody know of a bare-bones example to illustrate how it does a simple sample neural net? Is it only for the offset shifting kind of NN's common for language AI, or other kinds also?
There, I fixed it for you. (Score:2)

by JustNiz ( 692889 ) writes:

>> Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs AT MACHINE LEARNING
There, I fixed it for you.
- Re: (Score:2)
  
  by religionofpeas ( 4511805 ) writes:
  
  Thanks for fixing, but it was obvious for everyone else.
Will the chip be available to non-Googlers? (Score:2)

by wisebabo ( 638845 ) writes:

(Disclaimer, not an AI or machine learning expert but interested in learning!)
So will this chip (or board) be available outside of google? I've heard they've released (some of) their AI/Machine learning code, would be good if once you made a working application you could buy one of these things and speed it up. Would be especially useful for applications where access to the cloud was unavailable or intermittent at best (think self driving cars, drones, spacecraft).
I guess a PCI card that would go in a ser
- Re: (Score:2)
  
  by religionofpeas ( 4511805 ) writes:
  
  Check out Tensorflow.
  - Re: (Score:2)
    
    by wisebabo ( 638845 ) writes:
    
    So they've open sourced the software? That's good, but no chip will be available?
- Re: (Score:2)
  
  by trevc ( 1471197 ) writes:
  
  not much

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

I for one, (Score:4, Funny)

Re:I for one, (Score:4, Funny)

Re: (Score:2)

A purpose built chip (Score:5, Insightful)

Re: A purpose built chip (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: A purpose built chip (Score:4, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: A purpose built chip (Score:1)

Re:A purpose built chip (Score:5, Informative)

Re:A purpose built chip (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: A purpose built chip (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Wait you mean an ASIC is fast? Why I never! (Score:5, Informative)

Re:Wait you mean an ASIC is fast? Why I never! (Score:4, Insightful)

Re: (Score:3)

Re: (Score:2)

Say what... (Score:2)

Re:Say what... (Score:5, Informative)

Performance bottleneck (Score:1)

Re: Performance bottleneck (Score:2)

No, they don't (Score:2)

15-30x the speed (Score:3)

Re: (Score:2)

Inherent stratification (Score:2)

Re: (Score:2)

Prime directive! Bah! (Score:2)

amazing! (Score:2)

Machine language example? (Score:1)

There, I fixed it for you. (Score:2)

Re: (Score:2)

Will the chip be available to non-Googlers? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals