Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Graphics Network Software Technology

MIT Develops Algorithm To Accelerate Neural Networks By 200x (extremetech.com) 43

An anonymous reader quotes a report from ExtremeTech: MIT researchers have reportedly developed an algorithm that can accelerate [neural networks] by up to 200x. The NAS (Neural Architecture Search, in this context) algorithm they developed "can directly learn specialized convolutional neural networks (CNNs) for target hardware platforms -- when run on a massive image dataset -- in only 200 GPU hours," MIT News reports. This is a massive improvement over the 48,000 hours Google reported taking to develop a state-of-the-art NAS algorithm for image classification. The goal of the researchers is to democratize AI by allowing researchers to experiment with various aspects of CNN design without needing enormous GPU arrays to do the front-end work. If finding state of the art approaches requires 48,000 GPU arrays, precious few people, even at large institutions, will ever have the opportunity to try.

Algorithms produced by the new NAS were, on average, 1.8x faster than the CNNs tested on a mobile device with similar accuracy. The new algorithm leveraged techniques like path level binarization, which stores just one path at a time to reduce memory consumption by an order of magnitude. MIT doesn't actually link out to specific research reports, but from a bit of Google sleuthing, the referenced articles appear to be here and here -- two different research reports from an overlapping group of researchers. The teams focused on pruning entire potential paths for CNNs to use, evaluating each in turn. Lower probability paths are successively pruned away, leaving the final, best-case path. The new model incorporated other improvements as well. Architectures were checked against hardware platforms for latency when evaluated. In some cases, their model predicted superior performance for platforms that had been dismissed as inefficient. For example, 7x7 filters for image classification are typically not used, because they're quite computationally expensive -- but the research team found that these actually worked well for GPUs.

This discussion has been archived. No new comments can be posted.

MIT Develops Algorithm To Accelerate Neural Networks By 200x

Comments Filter:
  • Even the summary says that the 200x improvement is the learning cycle. The actual execution speed is less than 2x faster.

    • The learning cycle is the important part...

    • Utter bullshit (Score:4, Insightful)

      by goombah99 ( 560566 ) on Friday March 22, 2019 @05:07PM (#58317568)

      Apparently no one has heard of Wolpert's No-Free-Lunch-Theorem for search. It says then when averaged over all use cases no search algorithm out performs another (provided resources are not an issue). So one can have more resource efficient searches and one can have search algorithms that do better on some problems than others. It's great when you find a class of problems your search method is optimal for. But in general, no. Can't be done.
      TO get a 200x speed up on the test set they must have a 200x slow down on average elsewhere.
      That said this could be really useful for a large class of practical problems. So it's the hyperbole that is the bullshit not the research.

      • TO get a 200x speed up on the test set they must have a 200x slow down on average elsewhere.

        Possibly, yes. But typical images only form an absolutely tiny subset of all possible images. If we can speed up recognition of real-life images, at the cost of losing speed in fields of noisy random pixels, it's still a useful improvement.

      • by Anonymous Coward

        >TO get a 200x speed up on the test set they must have a 200x slow down on average elsewhere.

        You're making the unfounded assumption that both optimization algorithms are performing optimally.

        I can trivially disprove you by taking an existing algorithm A, and making a deliberately shittier version (A') that performs worse in all aspects.

    • The execution speed of neural networks is not very CPU intensive, it can happen almost immediately. The training is the part that takes a long time.
    • It seems the 'x' was a '%', actually.
  • by Anonymous Coward

    It's not hard to accelerate a process, if you are allowed to leave away details without caring for their importance. The hard part is to not lose quality in the process.

    TFA doesn't seem to say anything about that.

    • by ceoyoyo ( 59147 )

      Yes... they tested with one task on a single dataset. For a technique that (as far as I can gather from the terrible article) evaluates and discards options, that has a high risk of working well only in particular circumstances.

      • Worth mentioning that often the difficulty with neural networks isn't the training time, it's collecting the dataset in the first place.
        • by ceoyoyo ( 59147 )

          Very true. I'm not sure where the tens of thousands of GPU hours figure comes from either. Google may have trained for that long because they've got the hardware. You can train a decent ImageNet clone on a commodity GPU on your own computer in a day or so.

  • I though it was saying the algorithm would work before 2010.

    • by Misagon ( 1135 )

      Besides, the multiplication symbol is not the letter 'x', but Ã--

      ... oh wait, this is Slashdot. We're still stuck in ASCII.

  • That's the gist of it.

  • The teams focused on pruning entire potential paths for CNNs to use, evaluating each in turn. Lower probability paths are successively pruned away, leaving the final, best-case path.

    Now, i don't know about AI that much, so in this scenario it may be totally different.

    With chess engines, early pruning is a recipe for disaster. You might fool beginner players, you won't fool GM's. Pruning in general is already questionable, as you are deleting possibilities based on assumptions that are in turn based on a limited subset of your data. Early pruning leads to big performance gains but may also easily overlook possibilities because certain paths are assumed wrong. Just because sometimes some

    • You might fool beginner players, you won't fool GM's. Pruning in general is already questionable, as you are deleting possibilities based on assumptions that are in turn based on a limited subset of your data. Early pruning leads to big performance gains but may also easily overlook possibilities because certain paths are assumed wrong

      If you get big performance gains that means it plays better overall, so it will be more likely to fool GMs as well.

      Keep in mind that every human player also uses early pruning. A GM looking at board typically prunes 25 of the 30 possible moves right away.

An authority is a person who can tell you more about something than you really care to know.

Working...