Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Google AI Hardware

Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs (pcworld.com) 91

Four years ago, Google was faced with a conundrum: if all its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centers just to handle all of the requests to the machine learning system powering those services, reads a PCWorld article, which talks about how Tensor Processing Unit (TPU), a chip that is designed to accelerate the inference stage of deep neural networks came into being. The article shares an update: Google published a paper on Wednesday laying out the performance gains the company saw over comparable CPUs and GPUs, both in terms of raw power and the performance per watt of power consumed. A TPU was on average 15 to 30 times faster at the machine learning inference tasks tested than a comparable server-class Intel Haswell CPU or Nvidia K80 GPU. Importantly, the performance per watt of the TPU was 25 to 80 times better than what Google found with the CPU and GPU.
This discussion has been archived. No new comments can be posted.

Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs

Comments Filter:
  • I for one, (Score:4, Funny)

    by Bodhammer ( 559311 ) on Wednesday April 05, 2017 @02:23PM (#54180597)
    Welcome our new Google overlords. (or whatever...)
  • by Anonymous Coward on Wednesday April 05, 2017 @02:28PM (#54180649)

    outperforms general purpose chips?

    Wow.

    • by Anonymous Coward

      My thoughts exactly. How is it even the least but surprising that custom silicon is better than general purpose

      • Not sure why they didn't just call it what it is: ASIC.

        • by Jayfar ( 630313 )

          Not sure why they didn't just call it what it is: ASIC.

          Well TFU kinda did that: "TPUs are what’s known in chip lingo as an application-specific integrated circuit (ASIC)."

      • Re: (Score:2, Insightful)

        by Tough Love ( 215404 )

        Actually, what is really surprising is that Google considered the project worth doing to get only 15-30% advantage vs GPU, if those numbers are accurate. In the best case, this buys roughly an 18 month advantage before GPUs get faster and the engineering has to be done all over again, or the project will just go the way of other Google abandonware. And in that brief window, do saved operating costs justify the sunk engineering and fabrication cost? I doubt it.

        Now, on second look, this smells like a vanity

        • They are 15-30 times faster, not 15-30%. That's a huge difference. And this is only the first version, so it is likely that the TPU can be improved faster than GPUs that have been on the market for years.

        • I'm sure you've already seen the "it's 15-30 times, not 15-30 percent" replies. There's also the "performance per watt of the TPU was 25 to 80 times better". Can you imagine how much money this can save Google in electricity costs? It's 1-2 orders of magnitude better (10-100 times), with the possibility that they will continue to find dramatic improvements.

          If we equate your assessment with a "bunt", what Google really did is knocked the ball out of the park.

    • While there is some truth to that, this "purpose built" chip's purpose is to run an open-source AI language. So this is more interesting than a typical custom ASIC.

      • How so?

        Custom chip built for how the bits are handled is .... typical of an ASIC.

        • by Anonymous Coward

          I suppose the application is *slightly* more broad than a typical ASIC but yea, looks like some marketing article.

        • by ShanghaiBill ( 739463 ) on Wednesday April 05, 2017 @03:58PM (#54181437)

          How so?

          The TPU is a "purpose built" chip, but that purpose is very broad. It is optimized for massively parallel low-precision matrix operations, which is useful not only for neural nets, but also simulation of physical processes like CFD, weather prediction, climate models, computational chemistry, etc. It can do everything a GPU can do except the rasterization and texture mapping, but it can do it faster and with much less power.

          • by Baloroth ( 2370816 ) on Wednesday April 05, 2017 @05:12PM (#54181905)

            It is optimized for massively parallel low-precision matrix operations, which is useful not only for neural nets, but also simulation of physical processes like CFD, weather prediction, climate models, computational chemistry, etc.

            Maybe, but I doubt it. It's far too low precision, for one thing: 8-bits doesn't get you very far in any of those fields (you typically want at least 32-bit FLOPS for those, and quite often 64-bit precision is required, as numerical errors accumulate exponentially in a chaotic system), and they're really not even big matrices (just 256x256). Really the only place this kind of thing would excel is signal processing, which is basically what they're using them for.

        • By your broad definition, a GPU is also a "typical ASIC".

          • By your broad definition, a GPU is also a "typical ASIC".

            Yep... A GPU is basically an ASIC as well. It's programmed to do one thing well. That it can be used for other things that use similar calculations was pure coincidence (i.e. bitcoin mining).

            • Agreed that a GPU is an ASIC. I don't think they are "typical" in a few senses, but mainly the crowd around here go crazy over them.

          • Well, since it's not a CPU, yes it's an ASIC.

    • ASICs everywhere are embarrassed by how slow the TPUs are.

    • GPU chips are a bunch of multipliers. Floating point, which is, of course, integer , with an exponential. The press release talks about "inference" as if this a math function. My guess is Weighted multipliers. ...If Ggle have in fact made more efficient multipliers, then you gamers can turn off those noisy fans. Or they faster? It seemed to be saying they were more efficient.
    • It takes considerable organizational effort to push an ASIC all the way through through the pipe from design to production. Even budgeting and staffing are nontrivial. The technology might not be earth shattering, but the engineering process is respectable. And who knows, the technology might be earth shattering. But probably not. It uses numerical methods, analog would be faster and more interesting.

    • I can't count how many times someone thought they could build an ASIC to optimize some computation, only to get crushed by Moore's Law operating on general purpose CPUs.

      Never fight a land war during a Russian winter. Never bet against Ethernet. Never bet against Moore's Law.

      • It will take a while for Moore's law to catch up with 15-30 times speed improvement, and even better power improvement.

        And Moore's law also helps this chip.

        • I know, that's what makes this a remarkable achievement. Many times in the past people have tried to do this but it took much longer than they anticipated. In the mean time, Intel, AMD, nVidia, ATI, or whoever managed to catch up and surpass the ASIC. It turned out the performance win from ASICs had a shorter shelf life than people realized.

          It seems times have changed. Google has a very specific workload which appears to be different from what the mainstream processors have optimized for. The easy (easier?)

    • It's not that obvious when you're talking about floating point calculations in combination with external memory. A GPU is highly optimized for both of those requirements, and it's not all that simple to make an ASIC that does this better. The main reason Google got such an improvement is because the require much less precision in their results.

  • by Sycraft-fu ( 314770 ) on Wednesday April 05, 2017 @02:39PM (#54180723)

    Man is this a "duh" moment. Purpose built ASICs are extremely fast and low power for what they accomplish. That's why we use them. Look at a small desktop network switch: Little tiny processor that can pass 16gb/sec of traffic around. try and put 8 NICs in a computer and have it switch traffic and you'll be amazed at how much power you need. The reason the switch is small is it is purpose built: It's ASIC does nothing but switch Ethernet packets.

    Same deal with some thing on a CPU. You find that decoding an AVC video stream takes next to no CPU power on modern CPUs, yet decoding an MPEG-2 video takes some. Why? Because they have a small bit of dedicated logic for AVC decoding (usually some other formats too). It is low power because it is dedicated.

    Always the question in designing a system is flexibility and unit cost vs fixed function and up front cost. A CPU is great because it can do anything, and you can just buy them straight out, tons of companies have them available for purchase right now. However they take a lot of silicon and power to perform a given task. An ASIC takes a bunch of up front money to design and do a manufacturing run, but is very small and efficient, however it can't be reconfigured to do anything else and needs a full respin. In the middle there is something like an FPGA. Which one is right for a application just depends on the balance of a lot of factors.

    • by chispito ( 1870390 ) on Wednesday April 05, 2017 @02:54PM (#54180875)

      An ASIC takes a bunch of up front money to design and do a manufacturing run, but is very small and efficient, however it can't be reconfigured to do anything else and needs a full respin.

      Per TFA, the chips they designed are flexible enough to apply to new machine learning models. I think the point is that this was a space ripe for customized architecture, like graphics cards were 15-20 years ago.

    • There is a reason that all of the Bitcoin miners are ASIC based now.

      Don't expect those machines do be able to do anything else though if bitcoin dies off.

    • Purpose built ASICs are extremely fast and low power for what they accomplish

      And they have very specific algorithms. Certainly nothing traditionally resembling "machine learning".

  • I thought the TPU was for hard drive encryption. Or is it doing double duty?
  • if all its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centers

    The performance bottleneck in machine learning is training the system and the amount of training data, not the number of users running the model. Not sure I understand how usage is so directly proportional to computing costs.

  • by fred6666 ( 4718031 ) on Wednesday April 05, 2017 @03:05PM (#54180987)

    But 1000x as expensive?

    • Energy cost is lower, and those will be dominant over longer term.

      • Energy cost is lower, and those will be dominant over longer term.

        This is likely another demonstration of "those who have the money, make more money."

        Solar panels: You can save all kinds of money. If you can afford to install the system in the first place.
        Investments: You can make all kinds of interest. If you have money to invest.
        Toilet paper: You can save lots of money. If you buy it in bunches on sale. But if you can't spare the funds... your TP costs more than the person with a few bucks to spare who bu

  • Oh good, so our dystopian future can be realized just that much faster then...
  • but how many fps does it get running the new Mass Effect? Oh it can't?
  • What does the machine language for these things look like? Does anybody know of a bare-bones example to illustrate how it does a simple sample neural net? Is it only for the offset shifting kind of NN's common for language AI, or other kinds also?

  • >> Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs AT MACHINE LEARNING

    There, I fixed it for you.

  • (Disclaimer, not an AI or machine learning expert but interested in learning!)

    So will this chip (or board) be available outside of google? I've heard they've released (some of) their AI/Machine learning code, would be good if once you made a working application you could buy one of these things and speed it up. Would be especially useful for applications where access to the cloud was unavailable or intermittent at best (think self driving cars, drones, spacecraft).

    I guess a PCI card that would go in a ser

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (1) Gee, I wish we hadn't backed down on 'noalias'.

Working...