MIT Develops Algorithm To Accelerate Neural Networks By 200x (extremetech.com) 43
An anonymous reader quotes a report from ExtremeTech: MIT researchers have reportedly developed an algorithm that can accelerate [neural networks] by up to 200x. The NAS (Neural Architecture Search, in this context) algorithm they developed "can directly learn specialized convolutional neural networks (CNNs) for target hardware platforms -- when run on a massive image dataset -- in only 200 GPU hours," MIT News reports. This is a massive improvement over the 48,000 hours Google reported taking to develop a state-of-the-art NAS algorithm for image classification. The goal of the researchers is to democratize AI by allowing researchers to experiment with various aspects of CNN design without needing enormous GPU arrays to do the front-end work. If finding state of the art approaches requires 48,000 GPU arrays, precious few people, even at large institutions, will ever have the opportunity to try.
Algorithms produced by the new NAS were, on average, 1.8x faster than the CNNs tested on a mobile device with similar accuracy. The new algorithm leveraged techniques like path level binarization, which stores just one path at a time to reduce memory consumption by an order of magnitude. MIT doesn't actually link out to specific research reports, but from a bit of Google sleuthing, the referenced articles appear to be here and here -- two different research reports from an overlapping group of researchers. The teams focused on pruning entire potential paths for CNNs to use, evaluating each in turn. Lower probability paths are successively pruned away, leaving the final, best-case path. The new model incorporated other improvements as well. Architectures were checked against hardware platforms for latency when evaluated. In some cases, their model predicted superior performance for platforms that had been dismissed as inefficient. For example, 7x7 filters for image classification are typically not used, because they're quite computationally expensive -- but the research team found that these actually worked well for GPUs.
Algorithms produced by the new NAS were, on average, 1.8x faster than the CNNs tested on a mobile device with similar accuracy. The new algorithm leveraged techniques like path level binarization, which stores just one path at a time to reduce memory consumption by an order of magnitude. MIT doesn't actually link out to specific research reports, but from a bit of Google sleuthing, the referenced articles appear to be here and here -- two different research reports from an overlapping group of researchers. The teams focused on pruning entire potential paths for CNNs to use, evaluating each in turn. Lower probability paths are successively pruned away, leaving the final, best-case path. The new model incorporated other improvements as well. Architectures were checked against hardware platforms for latency when evaluated. In some cases, their model predicted superior performance for platforms that had been dismissed as inefficient. For example, 7x7 filters for image classification are typically not used, because they're quite computationally expensive -- but the research team found that these actually worked well for GPUs.
Re: (Score:2)
Trying to read the summary was bad enough.
Accelerate [development of] neural networks (Score:3)
Even the summary says that the 200x improvement is the learning cycle. The actual execution speed is less than 2x faster.
Re: Accelerate [development of] neural networks (Score:1)
The learning cycle is the important part...
Utter bullshit (Score:4, Insightful)
Apparently no one has heard of Wolpert's No-Free-Lunch-Theorem for search. It says then when averaged over all use cases no search algorithm out performs another (provided resources are not an issue). So one can have more resource efficient searches and one can have search algorithms that do better on some problems than others. It's great when you find a class of problems your search method is optimal for. But in general, no. Can't be done.
TO get a 200x speed up on the test set they must have a 200x slow down on average elsewhere.
That said this could be really useful for a large class of practical problems. So it's the hyperbole that is the bullshit not the research.
Re: (Score:2)
While your points are well intentioned, you haven't read the No-free-lunch-theorem. It addresses all your points and it's still true.
Re: (Score:2)
No, it is you who misunderstand the theorem. It most certainly does not say that "no search algorithm out performs another." And it is mathematically false that "when averaged over all use cases no search algorithm out performs another." That is easily mathematically proven to be false. For any search algorithm, I can create an algorithm that is 200x slower and provides no benefit. Using a real-world example, selection sort's best case is n^2 but merge sort's worst-case is n*log(n). Therefore "when aver
Re: (Score:2)
Good god man. you are completely wrong. The NFL excludes deliberate stupidity and inefficiency of course. But in the extreme, a random guessing or uphill climbing finds the minimum just as fast as steepest descent or genetic optimization. yes that is proven. The catch is that for problem on which steepest descent works well then it out performs the crazy idea of finding a minimum by going up hill. And that's likely most problems you will encounter in real life. You might even wonder how going up hill
Re: (Score:2)
Okay, between your reply and Wikipedia I am swayed. That's pretty amazing. Still, be careful generalizing this to mean "No improvements can ever be made to AI learning algorithms."
Re: (Score:2)
Suppose the search algorithm is "sort then do a binary search." Goombah99's arguments are more worth replying to than.
Re: (Score:2)
TO get a 200x speed up on the test set they must have a 200x slow down on average elsewhere.
Possibly, yes. But typical images only form an absolutely tiny subset of all possible images. If we can speed up recognition of real-life images, at the cost of losing speed in fields of noisy random pixels, it's still a useful improvement.
Re: (Score:1)
>TO get a 200x speed up on the test set they must have a 200x slow down on average elsewhere.
You're making the unfounded assumption that both optimization algorithms are performing optimally.
I can trivially disprove you by taking an existing algorithm A, and making a deliberately shittier version (A') that performs worse in all aspects.
Re: (Score:2)
Re: (Score:2)
Loss in quality? (Score:1)
It's not hard to accelerate a process, if you are allowed to leave away details without caring for their importance. The hard part is to not lose quality in the process.
TFA doesn't seem to say anything about that.
Re: (Score:2)
Yes... they tested with one task on a single dataset. For a technique that (as far as I can gather from the terrible article) evaluates and discards options, that has a high risk of working well only in particular circumstances.
Re: (Score:2)
Re: (Score:2)
Very true. I'm not sure where the tens of thousands of GPU hours figure comes from either. Google may have trained for that long because they've got the hardware. You can train a decent ImageNet clone on a commodity GPU on your own computer in a day or so.
Confusing title (Score:2)
I though it was saying the algorithm would work before 2010.
Re: (Score:2)
Besides, the multiplication symbol is not the letter 'x', but Ã--
We're all gonna die (Score:2)
That's the gist of it.
Possible recipe for disaster (Score:2)
The teams focused on pruning entire potential paths for CNNs to use, evaluating each in turn. Lower probability paths are successively pruned away, leaving the final, best-case path.
Now, i don't know about AI that much, so in this scenario it may be totally different.
With chess engines, early pruning is a recipe for disaster. You might fool beginner players, you won't fool GM's. Pruning in general is already questionable, as you are deleting possibilities based on assumptions that are in turn based on a limited subset of your data. Early pruning leads to big performance gains but may also easily overlook possibilities because certain paths are assumed wrong. Just because sometimes some
Re: (Score:2)
You might fool beginner players, you won't fool GM's. Pruning in general is already questionable, as you are deleting possibilities based on assumptions that are in turn based on a limited subset of your data. Early pruning leads to big performance gains but may also easily overlook possibilities because certain paths are assumed wrong
If you get big performance gains that means it plays better overall, so it will be more likely to fool GMs as well.
Keep in mind that every human player also uses early pruning. A GM looking at board typically prunes 25 of the 30 possible moves right away.