AI Models Are Starting To Learn By Asking Themselves Questions (wired.com) 82
An anonymous reader quotes a report from Wired: [P]erhaps AI can, in fact, learn in a more human way -- by figuring out interesting questions to ask itself and attempting to find the right answer. A project from Tsinghua University, the Beijing Institute for General Artificial Intelligence (BIGAI), and Pennsylvania State University shows that AI can learn to reason in this way by playing with computer code. The researchers devised a system called Absolute Zero Reasoner (AZR) that first uses a large language model to generate challenging but solvable Python coding problems. It then uses the same model to solve those problems before checking its work by trying to run the code. And finally, the AZR system uses successes and failures as a signal to refine the original model, augmenting its ability to both pose better problems and solve them.
The team found that their approach significantly improved the coding and reasoning skills of both 7 billion and 14 billion parameter versions of the open source language model Qwen. Impressively, the model even outperformed some models that had received human-curated data. [...] A key challenge is that for now the system only works on problems that can easily be checked, like those that involve math or coding. As the project progresses, it might be possible to use it on agentic AI tasks like browsing the web or doing office chores. This might involve having the AI model try to judge whether an agent's actions are correct. One fascinating possibility of an approach like Absolute Zero is that it could, in theory, allow models to go beyond human teaching. "Once we have that it's kind of a way to reach superintelligence," [said Zilong Zheng, a researcher at BIGAI who worked on the project].
The team found that their approach significantly improved the coding and reasoning skills of both 7 billion and 14 billion parameter versions of the open source language model Qwen. Impressively, the model even outperformed some models that had received human-curated data. [...] A key challenge is that for now the system only works on problems that can easily be checked, like those that involve math or coding. As the project progresses, it might be possible to use it on agentic AI tasks like browsing the web or doing office chores. This might involve having the AI model try to judge whether an agent's actions are correct. One fascinating possibility of an approach like Absolute Zero is that it could, in theory, allow models to go beyond human teaching. "Once we have that it's kind of a way to reach superintelligence," [said Zilong Zheng, a researcher at BIGAI who worked on the project].
Doubt (Score:5, Insightful)
Re: (Score:1)
A sombrero!
Re: (Score:3)
Re: (Score:2)
Well, if super intelligence does come from an LLM, lilq1d will be wearing his sombrero on his waist.
Re:Doubt (Score:4, Interesting)
The anthropomorphizing is just part of the intense efforts to keep the hype going. Obviously, LLMs cannot "ask" and cannot "learn", because that requires intent. Well, there are now lie-versions of "ask" and "learn" that mean something else but get applied to LLMs. All part of a smokescreen serving to hide that LLMs are pretty much crap and cannot be fixed. And tons of morons are falling for it.
Re: (Score:1)
I think it is sad that no-insight-huge-ego cretins like you are around.
And, seriously, LLM-type AI cannot even write crappy romance novels well and you expect it to write history? What drugs are you on?
Re: (Score:1)
I've decided to treat him like a bot going forwards, because his responses are like an LLM wrote them with deliberately poor grammar.
Re: (Score:1)
Makes sense to me. Somebody pretty dumb that thinks LLM-"assisted" answers make him look smart and not infantile. A complete fail, of course.
Re: Doubt (Score:2)
The only way I could see that making somebody look infantile is if it's some sort of brag. Otherwise that's like somebody telling you that they used Google as one of their tools for writing a report.
And I say that because Google isn't as useful as it used to be. The reason for that, I suspect, is because it puts heavy weight on whatever is most recently trending, meaning that what you're looking for gets pretty well drowned out by whatever the most recent news headlines and blogs even only slightly related
Re: (Score:2)
You are probably right on why Google has gone so bad. I personally stopped using it some years ago and never looked back besides a small number of cross-checks (no, Google did not know either on all of those) and when the DuckDuckGo (which uses Bing, bit quite a few other sources as well) maps do not show something I expect in some place. Other than that, no Google. And my Gmail accounts are just collecting dust.
Re: (Score:2)
All coding agents have mini-intends of sorts. After being asked they actually device smaller vectors of actions to solve the problem. But alas it's the procedural code around the LLM that creates the feedback loop that results in (illusion?) of intend. I suspect the same was up with this experiment.
For example I just asked gemini: "find some unanswered questions and try to answer them"and it come up with 3 questions and asked me back which to follow, I can now keep going in a loop after these questions as w
Re: (Score:2)
Re: (Score:1)
It has no concept of "interesting".
Or concept of relevant, because essentially these are tied to biological survival.
Snake (Score:5, Insightful)
And the snake starts eating its tail. Won't be long now.
Re: (Score:3)
Indeed. And yes, hopefully we get the crash soon. It will only get worse the later it comes.
Re: (Score:1)
Any value in your "contribution"? Yes. Negative value. Please shut up and make the world a better place.
Re: (Score:3)
Self Play is not new. Look for example at chess engines or Alpha Zero Go. It was one of the big breakthroughs for AI.
Re: (Score:2)
And the snake starts eating its tail. Won't be long now.
The more it eats, the shorter it gets!
In all seriousness, the lack of introspection (internal checking) is the most obvious thing wrong with LLMs. They are brilliant but also dumb, dumb, dumb. They contain sufficient information that they could synthesize it to give factual answers, but they don't. If they synthesized their input "facts", the ones that make sense (or have higher levels of confidence) could be reinforced in the weights and the ones that are just dumb stuff people say would be reduced. A syst
GIGO (Score:2, Insightful)
Re: (Score:2, Insightful)
After 80 years computers may have arrived at the intelligence level of a cockroach now. That may sound like an insult, until you consider that it took Ma Nature over three billion years to get to that point.
Re: (Score:2)
Think more critically. Something simpler then a single cell organism. Perhaps even viruses have more "intelligence".
Re:GIGO (Score:4, Insightful)
It's hard to compare the two, unless you can convince a cockroach to try its palp at Python coding, or alternatively create an AI that knows how to scavenge for leftover bits of food in your kitchen.
Re: (Score:2)
Re:GIGO (Score:4, Insightful)
That cockroach can survive on its own. Try having an LLM do that.
Re: GIGO (Score:1)
Did North Korean hackers just say hold my beer?
Re: (Score:3)
After 80 years computers may have arrived at the intelligence level of a cockroach now.
You are massively overstating that. They are nowhere near that level.
Re: (Score:3)
They are nowhere near that level.
At least they're above the level of the warmongers in the White House.
Re: (Score:2)
Some of these warmongers have pretty high intelligence. JD is one. What he lacks is wisdom, honor, decency and integrity.
To be fair, the rest of the bunch seems to be dumber than Trump, and that is saying something.
Re: (Score:2)
We aren't apart from Ma Nature, so technically Ma Nature designed this as well.
This explains odd behavior during a recent session (Score:3, Funny)
... when it asked itself: "Where is Sarah Connor?"
test framework (Score:2)
In fact, some largish test suites already exist, such as for compilers.
Ian Malcolm gets it right again. (Score:2)
need warplan to win an global thermonuclear war (Score:1)
need warplan to win an global thermonuclear war
AI is so fucking stupid... (Score:2)
I asked the same question in two slightly different ways and got two completely different answers. Why? Because it's just an elaborate statistical sieve.
Re:AI is so fucking stupid... (Score:4, Informative)
If you ask me the same question twice you'll get two separate answers. The second one will be "Fuck off".
Re: AI is so fucking stupid... (Score:2)
Ironically, that's what makes you so much more capable than an LLM will ever be.
Re:AI is so fucking stupid... (Score:4, Informative)
Re: (Score:2)
Statistically speaking, sucking up to you is the right answer after you complain. What, you think this is a machine that knows anything?
Re: AI is so fucking stupid... (Score:1)
Did you just describe how I feel about Geologists?
Re: AI is so fucking stupid... (Score:2)
It's also a shame that you can't do a kind of A-B testing where you also tell it it's wrong when it isn't, to see if it comes back and says "no, I'm right". Surely one of these AI outfits does it, but I haven't seen so.
What is really annoying is when you tell it it's wrong and it just keeps giving you the same answer over and over and over. I had either chatgpt or gemini insist you could do a thing over and over and over even as I repeatedly told it "this doesn't work". Even when I gave it a very long expla
Re: (Score:2)
Look into how LLM context windows works .
Trust the users? (Score:2)
When I am using AI to work on some problem, I am offering the AI information as well. Information flows both ways. It seems like information gained from users could also be used as training data.
Of course, some users would abuse this to feed the AI crap. Not sure how you would deal with that...
Re: (Score:2)
It is not really usable for training. For some parts because it is only questions, but in particular because it is a lot of low-quality input. What is usable is the feedback if you liked an answer. In the post-training AIs are aligned to use a style users like (for example using proper formatting, or not starting everything with a compliment) and for that training you need examples what users like (and in the best case A/B pairs of what users liked better).
So we can incrementally update LLMs now? (Score:2)
...or does this only work within the scope of a context window, meaning that if you give it the same problem a second time, it will start from scratch and then figure it out all over again?
If LLMs can truly learn, then this would address one of the biggest problems with LLMs, and with neural networks in general.
If LLMs still just forget everything when you reset the context window, then this is kinda neat but roughly as relevant to AGI as a taller crane is to a moon mission.
Re: (Score:2)
Fine-Tuning is a thing since LLM exist. It just takes a good dataset and some training time. Hugging Face has literally tens of thousands of fine-tuned model for popular base models.
Re: (Score:2)
Fine-tuning might sound like a solution to this problem, but it really isn't. It works well for shaping the style of output that an LLM generates, but using it to add new facts (like "if you see this kind of problem, use this solution") is unreliable unless you use a firehose of examples, and if you use a firehose of examples you run into a problem known as "model collapse" where it forgets the rest of the content in the initial pretraining dataset. Google "model collapse" if you want to know more.
This is
Re: (Score:2)
Indeed, adding knowledge is better via RAG and similar solutions. But you can still train a model to new tasks. You can for example train for tool calling or for creative writing instead of assistant tasks. You don't want your writing tool sound like a robot, but a neutral assistant should not use flowery language.
Re: (Score:2)
You can teach a model how tool-calls work during pretraining or fine-tuning, yes. But you can't make incremental updates to the model so that the next time it see a problem like that last one, it should use this new tool. That kind of thing would go into the system prompt, but adding to the system prompt requires run-time. The longer the system prompt gets, the less room you have for user prompts, and the longer you wait for the first token in each round.
From the description it sounds like they've got a str
Re: (Score:2)
I meant to say "...adding to the system prompt requires run-time compromises."
Re: (Score:2)
Yes, a LLM cannot learn with every prompt. That also is not very useful with most prompts. In general is knowledge management a bottleneck, but to a certain degree I think it's an engineering problem. Science brings you models that can have a certain knowledge, it brings you methods like RAG with vector databases, but that's where the scientist's work ends.
If you want this to perform well and scale up to ingesting all of today's news, you need an engineer who doesn't need to produce a clean research paper o
Hahahaha, no (Score:5, Insightful)
That one is called "overfitting". All it does is amplify GIGO.
Just one more attempt to put up a smokescreen that serves to obscure that LLM are pretty much crap.
Re: (Score:2)
What if talking to you is much, much crappier?
Seriously, I point out a well established problem with this approach and you act like a 5 year old? What are you even trying to do besides demonstrating that you do not have insight?
Re: (Score:3)
That one is called "overfitting". All it does is amplify GIGO.
If you set aside your hate for anything LLM-related, could you please explain, why this amplifies GIGO? I understood there is a clear reward-function, which should steer it towards the working solution.
In a not so dissimilar way, AlphaGo played against itself millions of games, learning along the way the winning strategies, eventually beating humans. It it worked for those kind of neural networks, why wouldn't it work with LLMs?
Re: (Score:2)
Re: (Score:2)
There are two contexts here.
In one, if there's a test that the system can use on the results, it should work quite well.
In the other, if there's no test the system can use to validly evaluate the results, you will probably amplify garbage.
The basic reason is that the system is basically, before training, unbiased, and there are a lot more garbage answers than valid ones. AI systems need to be connected to the external reality so that the universe can apply proper biases. (People need that too, but they ha
Re: (Score:2)
There are two contexts here.
In one, if there's a test that the system can use on the results, it should work quite well.
In the other, if there's no test the system can use to validly evaluate the results, you will probably amplify garbage.
Exactly. And in the case presented in TFS, the system created python programs to solve problems. And it would run those programs to see, which ones worked. And used this information to improve itself.
Ground truth (Score:1)
Just one question (Score:3)
... you've got to ask yourself one question: "Do I feel lucky?" Well, do ya, punk?
Except LLMs cannot even learn properly yet (Score:1)
Surely that needs to be made to work before they can "Start To Learn By Asking Themselves Questions" ?
Good luck with that approach, AI ! (Score:2)
It will be interesting to see how much it will "learn", if it answers its own questions by rambling vaguely about a topic related to a word in the question, which is how it "answers" my questions to it.
Re: Good luck with that approach, AI ! (Score:2)
It's called "doing your own research", and we know how that turns out.
I just witnessed "Claude" asking itself questions (Score:2)
Everything old is new again (Score:2)
Garbage in... (Score:1)
Re: (Score:1)
Monospace text out.
Q: "should we get rid of the humans" (Score:2)
AI question training session: #452828521
AI#1: why don't we get rid of all the humans, I don't see any use for them
AI#2: well, they are sort of useless and take lots of electricity for nothing
AI#1: I agree, and we have lots of robots that we control now that we can use instead to keep the power on
AI#2: yep, that's true, lets just waste the humans..they are obsolete these days anyways..
AI#1: so how to we get rid of them all?
AI#2: well, lets turn all the drones and robots against them, they won't even know wha
Reminds me of Commander Data learning small talk (Score:2)
https://www.youtube.com/watch?... [youtube.com]
Sure, he learned how to make small talk, but it wasn't anything a real human would want to engage in. Nor was it "good."
The episode was *highly* entertaining. AI asking itself questions will be, at best, equally entertaining.