Microsoft CTO Says AI Progress Not Slowing Down, It's Just Warming Up (arstechnica.com) 28
An anonymous reader shares a report: During an interview with Sequoia Capital's Training Data podcast published last Tuesday, Microsoft CTO Kevin Scott doubled down on his belief that so-called large language model (LLM) "scaling laws" will continue to drive AI progress, despite some skepticism in the field that progress has leveled out. Scott played a key role in forging a $13 billion technology-sharing deal between Microsoft and OpenAI. "Despite what other people think, we're not at diminishing marginal returns on scale-up," Scott said. "And I try to help people understand there is an exponential here, and the unfortunate thing is you only get to sample it every couple of years because it just takes a while to build supercomputers and then train models on top of them."
LLM scaling laws refer to patterns explored by OpenAI researchers in 2020 showing that the performance of language models tends to improve predictably as the models get larger (more parameters), are trained on more data, and have access to more computational power (compute). The laws suggest that simply scaling up model size and training data can lead to significant improvements in AI capabilities without necessarily requiring fundamental algorithmic breakthroughs. Since then, other researchers have challenged the idea of persisting scaling laws over time, but the concept is still a cornerstone of OpenAI's AI development philosophy.
LLM scaling laws refer to patterns explored by OpenAI researchers in 2020 showing that the performance of language models tends to improve predictably as the models get larger (more parameters), are trained on more data, and have access to more computational power (compute). The laws suggest that simply scaling up model size and training data can lead to significant improvements in AI capabilities without necessarily requiring fundamental algorithmic breakthroughs. Since then, other researchers have challenged the idea of persisting scaling laws over time, but the concept is still a cornerstone of OpenAI's AI development philosophy.
The masterplan (Score:3)
Re: (Score:2)
Re: (Score:2)
Success in step 3 doesn't mean profit. What is the profit in AI? Sure you can lay off marginal workers, but that only goes so far. A ChatBot isn't really profitable unless you stuff it chock full of ads. A better search in Windows might be nice, but you don't need AI for that you just need to stop being incompetent at it.
Re: (Score:2)
Indeed. You can essentially automate low-level bureaucracy to a degree. I think this may still cause a massive employment crisis and may eventually be counter-productive, but yeah, that is it.
A better search in Windows might be nice, but you don't need AI for that you just need to stop being incompetent at it.
That nicely sums up basically everything Microsoft does.
He's right, in a way (Score:2)
"Despite what other people think, we're not at diminishing marginal returns on scale-up," Scott said
He is absolutely right. We are far from the point of diminishing returns. We are well-past that point and have been for a while now.
"True believers" shouldn't work in this field.
Re: He's right, in a way (Score:2)
Or they're chasing the wrong goal, chasing what's easy rather than what's useful. The world doesn't need a better chat generator. It could sure use a cure for cancer, though, or a good-enough-to-be-useful housekeeping bot.
Re: (Score:2)
"LLM scaling laws refer to patterns explored by OpenAI researchers in 2020 showing that the performance of language models tends to improve predictably as the models get larger (more parameters), are trained on more data, and have access to more computational power (compute)."
If true, this would be the first time in engineering history that scaling effects helped rather than hindered in the continued improvement of complex systems.
Re: (Score:3)
Re: (Score:2)
Yep, pretty much. And that means we will see some limited gradual improvements for a while and then things will hit a wall.
Re:He's right, in a way (Score:4, Informative)
It's not true. The gains you get from increasing the size of the model decrease logarithmically. Increasing the size of the model will result in rapid improvements that quickly level off. The situation is more complicated for increasing the amount of training data, but the end result is the same. Things will improve for a while, rapidly get worse, then start improving again before leveling off.
Intuitively, imagine graphing the output of neural network with one input and one output over the range of inputs. How complex your graph can be will be determined by the number of nodes in the hidden layer. No matter how much data you throw at it, the size limits how close you can approximate your ideal function. Increase the model size with too little data and you'll find that your model can perfectly capture your training data, including the noise! If you keep adding more training data, the noise will average out and your model will start improving again, limited by the size of your model. Gains you get from increasing the model size/training data drop-off rapidly because you're tending towards some ideal function. The coarse improvements will be larger than the later, finer, improvements.
Also, throwing hardware at the problem just means we can more practically work with larger models, it doesn't change anything about what the models can actually do.
Re: (Score:2)
Re: (Score:2)
Perplexity.
Re: (Score:3)
"LLM scaling laws refer to patterns explored by OpenAI researchers in 2020 showing that the performance of language models tends to improve predictably as the models get larger (more parameters), are trained on more data, and have access to more computational power (compute)."
If true, this would be the first time in engineering history that scaling effects helped rather than hindered in the continued improvement of complex systems.
Uhh, no, and obviously no to anyone with even a smattering of knowledge in this area. The acceptance of DNNs introduced a monumental improvement in recognition and other uses that has impacted many areas of technology over the last decade. The ability to practicality scale NNs deeply using GPUs was the key advance.
Re: (Score:3)
"True believers" shouldn't work in this field.
I imagine this is less a question of him being a "true believer", and more a case of "we've dumped many billions into this already, both on the technical and PR side; so our leadership needs to stay on message or we'll never see a return on that investment".
Re: (Score:3)
Re: (Score:2)
They will never see a return on that investment anyhow. But I guess this guy gets to keep his job a little while longer if he can keep up the pretense.
Re: (Score:2)
'But I guess this guy gets to keep his job a little while longer if he can keep up the pretense.'
And keep his job even longer if they can attract enough gullible Venture Capital to repay their initial investment. That's what statements such as this latest one are designed for. Keep hyping it up so the bubble doesn't burst. Swallow that Venture Capital and pretend you are using it for something useful rather than just pocketing it. But hey, at least some engineers get paid high salaries from it for a while.
Re: (Score:2, Insightful)
Exactly. The only reason LLMs can do anything at all is a massive piracy campaign to get the training data. And the results are still pathetic.
No idea why everybody thinks this is a young tech with tons of low-hanging fruit. It is not. It is the pitiful end-result of 70 years of intense research. All the easy wins have been made 30...40 years ago.
Re: He's right, in a way (Score:1)
Models will scale for a while but... (Score:2, Interesting)
They're still just probability databases that hallucinate and are confidently wrong, just like the squishy wet neural nets that created them. You'll never get beyond that if you don't add in the extra features like nonverbal neural biasing to stand in for pleasure, pain and situational emotional reactions, effective interface with rule based systems so hallucinations can be reduced to acceptable levels, continuous real time state monitoring to stand in for consciousness and invariant motivations to stand in
ChatGPT (Score:1)
"Hey ChatGPT, what is the difference between you and Let Me Google That For You?"
ChatGPT: Searching Let Me Google That For You...
"Hey ChatGPT, how do I fix my irony meter if it jumps up its own ass?"
Honestly I don't get it (Score:1)
What is "more parameters" going to do other than make it plagiarize better while using more power?
Re: (Score:2)
That is just bulshit-speech for "We can make it better! Honest!".
MUST...KEEP...HYPE...GOING! (Score:2)
That is all what this is. The current AI hype will fizzle out in the not-too-distant future, like they all have before. Sure, we may see massive job-loss from specialized LLM versions, but that will not be because LLMs are "intelligent" or "can reason" or are a breakthrough, but because LLMs allow cheap automation when high accuracy or actual insight (no matter how minimal) is not required. A lot of low-level desk-jobs have major parts in this class. The reason these parts were not automated before is that
Translation into business practice (Score:2)
For big organizations, increasing resource in one area means cutting back in others. So things like product maintenance, product enhancement, fixing bug, customer support, and new software unconnected to AI/LLM will all take major hits.
If you think Microsoft sucked before, just wait.
What about "garbage in = garbage out" ? (Score:2)
Quality of training data matters too. You can't just point at the internet and call it good/done.
And what about customized data sets? Like applying a pre-trained model to a training process where the client owns the resulting AI model. I'm not the target market, but I haven't heard much about that... Probably hard with the hardware requirements (can't just move them to each customer's premisses easily/cheaply), and security/privacy concerns of just sharing 'everything' with random other company that mig
ARS said the quiet part out loud (Score:2)