Forgot your password?
typodupeerror
AI Education

AI Models Are Starting To Learn By Asking Themselves Questions (wired.com) 82

An anonymous reader quotes a report from Wired: [P]erhaps AI can, in fact, learn in a more human way -- by figuring out interesting questions to ask itself and attempting to find the right answer. A project from Tsinghua University, the Beijing Institute for General Artificial Intelligence (BIGAI), and Pennsylvania State University shows that AI can learn to reason in this way by playing with computer code. The researchers devised a system called Absolute Zero Reasoner (AZR) that first uses a large language model to generate challenging but solvable Python coding problems. It then uses the same model to solve those problems before checking its work by trying to run the code. And finally, the AZR system uses successes and failures as a signal to refine the original model, augmenting its ability to both pose better problems and solve them.

The team found that their approach significantly improved the coding and reasoning skills of both 7 billion and 14 billion parameter versions of the open source language model Qwen. Impressively, the model even outperformed some models that had received human-curated data. [...] A key challenge is that for now the system only works on problems that can easily be checked, like those that involve math or coding. As the project progresses, it might be possible to use it on agentic AI tasks like browsing the web or doing office chores. This might involve having the AI model try to judge whether an agent's actions are correct. One fascinating possibility of an approach like Absolute Zero is that it could, in theory, allow models to go beyond human teaching. "Once we have that it's kind of a way to reach superintelligence," [said Zilong Zheng, a researcher at BIGAI who worked on the project].

This discussion has been archived. No new comments can be posted.

AI Models Are Starting To Learn By Asking Themselves Questions

Comments Filter:
  • Doubt (Score:5, Insightful)

    by liqu1d ( 4349325 ) on Friday January 09, 2026 @11:41PM (#65914114)
    Please stop giving it human descriptions. It has no concept of "interesting". If super intelligence comes from a LLM ill eat my hat (after I buy a hat)!
    • A sombrero!

    • Re:Doubt (Score:4, Interesting)

      by gweihir ( 88907 ) on Saturday January 10, 2026 @03:31AM (#65914312)

      The anthropomorphizing is just part of the intense efforts to keep the hype going. Obviously, LLMs cannot "ask" and cannot "learn", because that requires intent. Well, there are now lie-versions of "ask" and "learn" that mean something else but get applied to LLMs. All part of a smokescreen serving to hide that LLMs are pretty much crap and cannot be fixed. And tons of morons are falling for it.

      • All coding agents have mini-intends of sorts. After being asked they actually device smaller vectors of actions to solve the problem. But alas it's the procedural code around the LLM that creates the feedback loop that results in (illusion?) of intend. I suspect the same was up with this experiment.
        For example I just asked gemini: "find some unanswered questions and try to answer them"and it come up with 3 questions and asked me back which to follow, I can now keep going in a loop after these questions as w

      • I got talked down here the other day when I said that "LLMs don't hallucinate, every answer is equally made up". I should get with the program, etcetera, and specialists just use that terminology. You just hit the nail on the head, of course the CEOs and AI specialists call it hallucinations, since that sounds like there'll be a cure "real soon now". They all have an interest to keep the hype going, it's like printing money.
    • by sinij ( 911942 )

      It has no concept of "interesting".

      Or concept of relevant, because essentially these are tied to biological survival.

  • Snake (Score:5, Insightful)

    by RitchCraft ( 6454710 ) on Saturday January 10, 2026 @12:10AM (#65914142)

    And the snake starts eating its tail. Won't be long now.

    • by gweihir ( 88907 )

      Indeed. And yes, hopefully we get the crash soon. It will only get worse the later it comes.

    • by allo ( 1728082 )

      Self Play is not new. Look for example at chess engines or Alpha Zero Go. It was one of the big breakthroughs for AI.

    • by piojo ( 995934 )

      And the snake starts eating its tail. Won't be long now.

      The more it eats, the shorter it gets!

      In all seriousness, the lack of introspection (internal checking) is the most obvious thing wrong with LLMs. They are brilliant but also dumb, dumb, dumb. They contain sufficient information that they could synthesize it to give factual answers, but they don't. If they synthesized their input "facts", the ones that make sense (or have higher levels of confidence) could be reinforced in the weights and the ones that are just dumb stuff people say would be reduced. A syst

  • GIGO (Score:2, Insightful)

    by Anonymous Coward
    Garbage In, Garbage Out. The AI models desire to be on the same level as Trump voters. We are doomed.
    • Re: (Score:2, Insightful)

      by cusco ( 717999 )

      After 80 years computers may have arrived at the intelligence level of a cockroach now. That may sound like an insult, until you consider that it took Ma Nature over three billion years to get to that point.

  • by dmomo ( 256005 ) on Saturday January 10, 2026 @12:52AM (#65914172)

    ... when it asked itself: "Where is Sarah Connor?"

  • It would be interesting to build a solid test framework for a rather largish project and ask the generative to work until it can produce a program that passes the tests. I hypothesize it will not ever finish.

    In fact, some largish test suites already exist, such as for compilers.
  • "Your scientists were so preoccupied with whether they could, they didn't stop to think if they should." - 'Jurassic Park', 1993
  • need warplan to win an global thermonuclear war

  • I asked the same question in two slightly different ways and got two completely different answers. Why? Because it's just an elaborate statistical sieve.

  • When I am using AI to work on some problem, I am offering the AI information as well. Information flows both ways. It seems like information gained from users could also be used as training data.

    Of course, some users would abuse this to feed the AI crap. Not sure how you would deal with that...

    • by allo ( 1728082 )

      It is not really usable for training. For some parts because it is only questions, but in particular because it is a lot of low-quality input. What is usable is the feedback if you liked an answer. In the post-training AIs are aligned to use a style users like (for example using proper formatting, or not starting everything with a compliment) and for that training you need examples what users like (and in the best case A/B pairs of what users liked better).

  • ...or does this only work within the scope of a context window, meaning that if you give it the same problem a second time, it will start from scratch and then figure it out all over again?

    If LLMs can truly learn, then this would address one of the biggest problems with LLMs, and with neural networks in general.

    If LLMs still just forget everything when you reset the context window, then this is kinda neat but roughly as relevant to AGI as a taller crane is to a moon mission.

    • by allo ( 1728082 )

      Fine-Tuning is a thing since LLM exist. It just takes a good dataset and some training time. Hugging Face has literally tens of thousands of fine-tuned model for popular base models.

      • Fine-tuning might sound like a solution to this problem, but it really isn't. It works well for shaping the style of output that an LLM generates, but using it to add new facts (like "if you see this kind of problem, use this solution") is unreliable unless you use a firehose of examples, and if you use a firehose of examples you run into a problem known as "model collapse" where it forgets the rest of the content in the initial pretraining dataset. Google "model collapse" if you want to know more.

        This is

        • by allo ( 1728082 )

          Indeed, adding knowledge is better via RAG and similar solutions. But you can still train a model to new tasks. You can for example train for tool calling or for creative writing instead of assistant tasks. You don't want your writing tool sound like a robot, but a neutral assistant should not use flowery language.

          • You can teach a model how tool-calls work during pretraining or fine-tuning, yes. But you can't make incremental updates to the model so that the next time it see a problem like that last one, it should use this new tool. That kind of thing would go into the system prompt, but adding to the system prompt requires run-time. The longer the system prompt gets, the less room you have for user prompts, and the longer you wait for the first token in each round.

            From the description it sounds like they've got a str

            • I meant to say "...adding to the system prompt requires run-time compromises."

            • by allo ( 1728082 )

              Yes, a LLM cannot learn with every prompt. That also is not very useful with most prompts. In general is knowledge management a bottleneck, but to a certain degree I think it's an engineering problem. Science brings you models that can have a certain knowledge, it brings you methods like RAG with vector databases, but that's where the scientist's work ends.

              If you want this to perform well and scale up to ingesting all of today's news, you need an engineer who doesn't need to produce a clean research paper o

  • Hahahaha, no (Score:5, Insightful)

    by gweihir ( 88907 ) on Saturday January 10, 2026 @03:28AM (#65914304)

    That one is called "overfitting". All it does is amplify GIGO.

    Just one more attempt to put up a smokescreen that serves to obscure that LLM are pretty much crap.

    • by Bumbul ( 7920730 )

      That one is called "overfitting". All it does is amplify GIGO.

      If you set aside your hate for anything LLM-related, could you please explain, why this amplifies GIGO? I understood there is a clear reward-function, which should steer it towards the working solution.

      In a not so dissimilar way, AlphaGo played against itself millions of games, learning along the way the winning strategies, eventually beating humans. It it worked for those kind of neural networks, why wouldn't it work with LLMs?

      • In the day I wrote some neural nets: small data-vectors, only 3 node-sets and no  explicit rewards function; system always found an answer.  Currently, is a "rewards function" hard-coded  in LLM or spontaneously appear from the training process ? So I understand quadratic reward functions  naturally exhibit the concept of "limit" and thus settle upon answers . Does this allow "questions" to evolve from data-vectors?
      • by HiThere ( 15173 )

        There are two contexts here.

        In one, if there's a test that the system can use on the results, it should work quite well.

        In the other, if there's no test the system can use to validly evaluate the results, you will probably amplify garbage.

        The basic reason is that the system is basically, before training, unbiased, and there are a lot more garbage answers than valid ones. AI systems need to be connected to the external reality so that the universe can apply proper biases. (People need that too, but they ha

        • by Bumbul ( 7920730 )

          There are two contexts here.

          In one, if there's a test that the system can use on the results, it should work quite well.

          In the other, if there's no test the system can use to validly evaluate the results, you will probably amplify garbage.

          Exactly. And in the case presented in TFS, the system created python programs to solve problems. And it would run those programs to see, which ones worked. And used this information to improve itself.

  • We're not ready for an AI that learns starting at physics. There are just too many things we don't want to admit are other than we choose to believe.
  • by Bu11etmagnet ( 1071376 ) on Saturday January 10, 2026 @04:17AM (#65914374)

    ... you've got to ask yourself one question: "Do I feel lucky?" Well, do ya, punk?

  • As I understand it, it is a challenge to get LLMs to learn after the initial training and setting of weights.

    Surely that needs to be made to work before they can "Start To Learn By Asking Themselves Questions" ?
  • It will be interesting to see how much it will "learn", if it answers its own questions by rambling vaguely about a topic related to a word in the question, which is how it "answers" my questions to it.

  • And they were not of the difficult kind, basically just contemplating why the simple conversion of an existing script to another script language yielded different results. It asked itself almost the same questions for hours, creating debug-code and looking at its output along the way, without ever converging on a result. Having seen this epic failure unfold I doubt that LLMs are capable of "learning by asking themselves".
  • What comes next?
  • AI question training session: #452828521

    AI#1: why don't we get rid of all the humans, I don't see any use for them
    AI#2: well, they are sort of useless and take lots of electricity for nothing
    AI#1: I agree, and we have lots of robots that we control now that we can use instead to keep the power on
    AI#2: yep, that's true, lets just waste the humans..they are obsolete these days anyways..
    AI#1: so how to we get rid of them all?
    AI#2: well, lets turn all the drones and robots against them, they won't even know wha

  • https://www.youtube.com/watch?... [youtube.com]

    Sure, he learned how to make small talk, but it wasn't anything a real human would want to engage in. Nor was it "good."

    The episode was *highly* entertaining. AI asking itself questions will be, at best, equally entertaining.

Never try to teach a pig to sing. It wastes your time and annoys the pig. -- Lazarus Long, "Time Enough for Love"

Working...