OpenAI's Science Chief Says LLMs Aren't Ready For Novel Discoveries and That's Fine (technologyreview.com) 46
OpenAI launched a dedicated team in October called OpenAI for Science, led by vice president Kevin Weil, that aims to make scientists more productive -- but Weil admitted in an interview with MIT Technology Review that the LLM cannot yet produce novel discoveries and says that's not currently the mission.
UC Berkeley statistician Nikita Zhivotovskiy, who has used LLMs since the first ChatGPT, told the publication: "So far, they seem to mainly combine existing results, sometimes incorrectly, rather than produce genuinely new approaches."
"I don't think models are there yet," Weil admitted. "Maybe they'll get there. I'm optimistic that they will." The models excel at surfacing forgotten solutions and finding connections across fields, but Weil says the bar for accelerating science doesn't require "Einstein-level reimagining of an entire field."
GPT-5 has read substantially every paper written in the last 30 years, he says, and can bring together analogies from unrelated disciplines. That accumulation of existing knowledge -- helping scientists avoid struggling on problems already solved -- is itself an acceleration.
UC Berkeley statistician Nikita Zhivotovskiy, who has used LLMs since the first ChatGPT, told the publication: "So far, they seem to mainly combine existing results, sometimes incorrectly, rather than produce genuinely new approaches."
"I don't think models are there yet," Weil admitted. "Maybe they'll get there. I'm optimistic that they will." The models excel at surfacing forgotten solutions and finding connections across fields, but Weil says the bar for accelerating science doesn't require "Einstein-level reimagining of an entire field."
GPT-5 has read substantially every paper written in the last 30 years, he says, and can bring together analogies from unrelated disciplines. That accumulation of existing knowledge -- helping scientists avoid struggling on problems already solved -- is itself an acceleration.
Who deserves the credit? Who deserves the blame? (Score:3)
A moment of honesty (Score:4, Interesting)
Among so much hype, I almost can't believe he said the quiet part out loud: LLMs are not thinking creatures.
Any bets on if he keeps his job?
Re: (Score:2)
Among so much hype, I almost can't believe he said the quiet part out loud: LLMs are not thinking creatures.
Any bets on if he keeps his job?
He covered his ass. Further in he said he's pretty confident they'll get there, they just aren't there yet.
This is more a rippling pebble in the stream of bullshit, not a diversion.
Re: (Score:3, Informative)
Among so much hype, I almost can't believe he said the quiet part out loud: LLMs are not thinking creatures.
Any bets on if he keeps his job?
As people generally struggle to fail to use the "general" in general intelligence, I expect not many people will even notice.
But yes, he essentially said that LLMs cannot do novel things. The "yet" is an obvious lie by misdirection.
Nope (Score:3, Informative)
https://www.wired.com/story/ai... [wired.com]
TL:DR: the math shows they *can't* go beyond a certain complexity.
Re:Nope (Score:5, Interesting)
Yep. Obviously. They have that little problem that each step they take to get to something has a probability of failing, unlike actual deduction. That probability adds up and at some depth, it is all noise.
Good to see somebody took the time to look into things. From briefly skimming the paper, the upper bounds seem to not depend on the LLM, but solely on the complexity of the question asked. The limit they examine seems to be of the form that "all queries above the complexity limit will end up in hallucination".
LLM's are prediction machines (Score:2)
They are a 'magic' mathematical box that will give you the most probable output based off of inputs that they have already seen. If they haven't seen something or something like it they aren't going to give you that output.
The real problem with the industry is they are working with a 'black box' that gives you output's from inputs, but they don't really know what is going on inside. Yes they are finding out more and more, but if you don't know how an engine works you certainly aren't going to be able to des
Re:LLM's are prediction machines (Score:5, Insightful)
Re: (Score:3)
I think the best outcome is to wait until the C-Suites are replaced by LLMs, they''ll be dragged screaming and kicking to the door: "B....B....But you need me!, who will bullshit the investors like we do." Whereupon, the Bullshit Bot will finally be revealed as a component of the LLM C-Executive Team.
A little while later, the investors will realize they've been had and drag the LLM C-Executive Team screaming and kicking to the door: "B....B....But you need me!, who will bullshit the investors like we do." W
Re: (Score:3)
Indeed. What is hilarious is that, apparently, many people are suffering from similar issues and cannot actually put the "general" in general intelligence in whatever thinking they are capable of. And hence the hype continues, despite very clear evidence that it cannot deliver.
As to "AGI", that is a "never" for LLMs. The approach cannot do it. We still have no credible practical mathematical models how AGI could be done.
I would submit that automated theorem proving or automated deduction (basically the same
Re: (Score:2)
The point goes into the other direct: Are humans different?
A human cannot decide on its next action on other grounds than input and what it has seen before either. People always imply there would be something magical about the brain, but in the end it is also just a very complex neural network learning from sensor inputs and controlling several I/O channels.
Re: (Score:2)
The next step in the conversation is a debate on free will.
Re: (Score:2)
Indeed, this gets into philosophy quickly. And don't start thinking about what "consciousness" or even "life" may mean and if it can be simulated in (more advanced) artificial systems or not.
Re: (Score:2)
Physicalism is belief, not Science. You are arguing a quasi-religious stance. The actual Science says that we have no clue what humans do to show intelligence.
Re: (Score:2)
You've got this the wrong way. Religion says "there is something special about humans" while science says "There is nothing that cannot be explained, we just don't know all explanations yet"
Re: (Score:2)
And fail. Science very much does NOT say "There is nothing that cannot be explained". In fact, by Gödel's Incompleteness, Science very much says "we are very sure there are things that cannot be explained". Since that is a result from Mathematics, it remains to be seen how far it applies to physical reality. But it is a solid result.
You are doing it wrong. Science is not Religion to mindlessly belief in. Science is not omnipotent and not a surrogate "God". One of the tings that makes Science powerful
Re: (Score:2)
That is not what Gödel says. You are using that theorem like the conspiracy theorists, who argue "You cannot rely on sciece, because ... Gödel!"
You could do the same for physics and claim physics cannot work because of Schrödinger uncertainty.
Neither disprove science or assert that there are limits what we can know about things. You could of course argue that there is a limit in what we can get to know (like having a hard time to measure very small, very distant, etc. objects) but that only s
Goalposts (Score:2)
Re: (Score:2)
Obviously. Anything and any lie to keep the hype going.
This is a lie by misdirection. And he knows it. (Score:1)
The actual reality is that LLMs are not _capable_ of "novel" discoveries. All they can do, occasionally, is find something were everything is known, but there is a simplistic step missing to connect things. And that is it.
What this dishonest asshole implies is that this will change and LLMs may well get that capability. That will not happen. The Math used does not allow it.
Re: (Score:1)
Sometimes pure math has small errors that slip by the best. Peter Scholze famously didn't trust his own proof that showed functional analysis was effectively a branch of commutative algebra, and wanted some computer verification. Applied math in particular is deliciously messy, something I didn't appreciate until later in life. That isn't to say a mathematical proof is not to be trusted, but rather that a sufficiently complex and recent proof may in fact turn out to have small errors. In fact, it wouldn't s
Re: (Score:2)
Physicalism is belief, no Science. It is unknown whether throwing enough (simulated or real) neurons together can make a human or not.
Summary doesn't match TFA (Score:3)
He plays down the idea that LLMs are about to come up with a game-changing new discovery. “I don’t think models are there yet,” he says. “Maybe they’ll get there. I’m optimistic that they will.” But, he insists, that’s not the mission: “Our mission is to accelerate science. And I don’t think the bar for the acceleration of science is, like, Einstein-level reimagining of an entire field.”
Notice the important part before the sentence; he's talking about "game-changing new discovery" not regular nuts and bolts of scientific work. And in at least some fields, we are seeing these used to accelerate work. He talks about in the article a bit. The article correctly notes that there's been inaccurate hype from his group in the past, including claiming to have solved open math problems where it really just found solutions in obscure literature. But it is important to note that there has been successful work using LLMs to do math, especially when the systems are paired with more traditional theorem proving software and proof checkers like Lean. There's a long list compiled on Mathoverflow of successful examples https://mathoverflow.net/questions/502120/examples-for-the-use-of-ai-and-especially-llms-in-notable-mathematical-developme [mathoverflow.net] and this list is now getting long enough and this is getting common enough that the Meta Mathoverflow has had discussion about stop expanding the list https://meta.mathoverflow.net/questions/6348/use-of-llms-in-notable-mathematical-developments [mathoverflow.net]. Right now, the most successful system seems to be Aristotle, which is a software which combines LLM style informal reasoning with automated proof checking.
Now, math has some major advantages over other fields here: all the content is in the paper. In a lot of other fields, raw data and the like is not in the papers. Moreover, the universe is weird and complicated, and LLMs cannot just run experiments. In contrast, in pure math, all the problem aspects are in the problem statement itself. I am pretty skeptical that LLMs will ever get to the point of making "Einstein-level" discoveries since by nature they are still working off their training data. Without some major breakthroughs, it seems like they will still be fundamentally limited. But we shouldn't use their lack of that ability to dismiss that they are being used now to do genuine novel work, and one shouldn't summarize articles in a way that suggests someone was saying something they were not.
Re: (Score:2)
Re: (Score:2)
Yes, you do misunderstand. There are two different theorems frequently referred to as Godel's theorems which are closely related, but in this case, the relevant one you are referring to is the statement that says roughly that any axiomatic system which is strong enough to model the natural numbers must have statements that cannot be proved or disproved within that system. This is a rough summary of a somewhat subtle result. Note that one needs to be really careful about what one means by axiomatic system o
Re: (Score:2)
Re: (Score:2)
Sounds like the lie many in power use (Score:2)
Yep, things are not good now, but Wait!!! It will get better, you'll see. You better keep the feeding tough full or you won't see the wonders we have in store for you. Let me tell you about some of those wonders: Ooooga Boooga!! Whinge, whinge, whinge!! Burble furble....look at that Blibbering Humdinger in the Window.
They are just pols dressed up to look like CEOs.
how are AI images created? (Score:2)
Re: (Score:2)
LLM's rely upon previous use of language. Words can have multiple meanings of course but we have context to sort it out. But images? what is the basic unit of it? How does the AI recombine these image sub-components to something new? anybody know?
There are a lot of scientific papers written about the research done by these companies. See e.g. here: https://openai.com/research/in... [openai.com]
Re: (Score:2)
Denoising.
You start with an image and add 10% Gaussian noise to it. Then you repeat the process several times, gradually increasing the noise to 20%, 30%, ... until the image becomes pure noise. You then take this sequence and train a neural network to go from 100% noise to 90%, to 80% and so on.
To generate an image with that network you start with completely random noise and iteratively apply the learned denoising steps.
Here is a tutorial that illustrates the examples nicely: https://tree.rocks/make-diffus [tree.rocks]
Re: (Score:2)
I'm curious... (Score:2)
cherry picking liars (Score:2)
GIGO (Score:2)
Thus AI will suffer from the GIGO effect.
What percentage of papers are fraudulent or have non reproducible data.
Good luck with. AI working that out.
Wait till AI starts writing researchslop papers enmasse.
Lead to or produce? (Score:2)
There's a difference between a technology being used as a tool for innovation and that technology producing a packaged tool or report with a finished innovation. As an example, digging up and testing different microbes in randomly collected soil is a valid technique for finding new antibiotics and microbes with useful characteristics. In no way does this technique point to a specific innovation, but its use leads to specific innovations. Likewise, LLMs can lead to new innovations with having to produce a
Re: (Score:2)
... Likewise, LLMs can lead to new innovations with having to produce a neatly finished, packaged answer to queries.
Did you mean that last sentence to say "without" vs. "with"? Otherwise, I follow what you are saying.
Actually, I disagree. Here's why. (Score:2)
If you study the history of inventions and discoveries, just about none of them happened in a vacuum.
- The airplane was simultaneously invented in several countries, including the US, Brazil, France, Germany, and the UK. Though some argue the US was first, there was significant overlap.
- Penicillin was brought to us by Alexander Fleming, but other researchers were close on his heels: Gerhard Domagk, Andre Gratia and Sara Dath, The Oxford Team, Selman Waksman, and Mary Hunt.
- Even the Rubik's Cube was simult
Re: (Score:2)
Curie in Fallout 4 (Score:2)
Sigh (Score:2)
Gosh, do you mean that THIS generation of AI can also only regurgitate its training database according to a statistical fit, and not infer any new about the data whatsoever?
The oldest problem in AI? That people literally try to pretend is solved by calling one part of the training "inference" now? Even though it doesn't infer a damn thing that's not in the data?
Gosh. Who'd've thunk?