Google DeepMind's AI Systems Can Now Solve Complex Math Problems (technologyreview.com) 40
Google DeepMind has announced that its AI systems, AlphaProof and AlphaGeometry 2, have achieved silver medal performance at the 2024 International Mathematical Olympiad (IMO), solving four out of six problems and scoring 28 out of 42 possible points in a significant breakthrough for AI in mathematical reasoning. This marks the first time an AI system has reached such a high level of performance in this prestigious competition, which has long been considered a benchmark for advanced mathematical reasoning capabilities in machine learning.
AlphaProof, a system that combines a pre-trained language model with reinforcement learning techniques, demonstrated its new capability by solving two algebra problems and one number theory problem, including the competition's most challenging question. Meanwhile, AlphaGeometry 2 successfully tackled a complex geometry problem, Google wrote in a blog post. The systems' solutions were formally verified and scored by prominent mathematicians, including Fields Medal winner Prof Sir Timothy Gowers and IMO Problem Selection Committee Chair Dr Joseph Myers, lending credibility to the achievement.
The development of these AI systems represents a significant step forward in bridging the gap between natural language processing and formal mathematical reasoning, the company argued. By fine-tuning a version of Google's Gemini model to translate natural language problem statements into formal mathematical language, the researchers created a vast library of formalized problems, enabling AlphaProof to train on millions of mathematical challenges across various difficulty levels and topic areas. While the systems' performance is impressive, challenges remain, particularly in the field of combinatorics where both AI models were unable to solve the given problems. Researchers at Google DeepMind continue to investigate these limitations, the company said, aiming to further improve the systems' capabilities across all areas of mathematics.
AlphaProof, a system that combines a pre-trained language model with reinforcement learning techniques, demonstrated its new capability by solving two algebra problems and one number theory problem, including the competition's most challenging question. Meanwhile, AlphaGeometry 2 successfully tackled a complex geometry problem, Google wrote in a blog post. The systems' solutions were formally verified and scored by prominent mathematicians, including Fields Medal winner Prof Sir Timothy Gowers and IMO Problem Selection Committee Chair Dr Joseph Myers, lending credibility to the achievement.
The development of these AI systems represents a significant step forward in bridging the gap between natural language processing and formal mathematical reasoning, the company argued. By fine-tuning a version of Google's Gemini model to translate natural language problem statements into formal mathematical language, the researchers created a vast library of formalized problems, enabling AlphaProof to train on millions of mathematical challenges across various difficulty levels and topic areas. While the systems' performance is impressive, challenges remain, particularly in the field of combinatorics where both AI models were unable to solve the given problems. Researchers at Google DeepMind continue to investigate these limitations, the company said, aiming to further improve the systems' capabilities across all areas of mathematics.
Still a problem of syntax (Score:2, Insightful)
Mathematics is a system where semantics has a 1:1 relationship to syntax, so maths is actually the single-easiest task.
Formal proof engines (such as HOL and COQ) have existed for a long time, reproducing a small fraction of their capability by means of neural nets is certainly ingenius and valuable, but it's only going to be useful when you can reproduce everything a mathematical modelling package can do.
And it still won't be intelligent, merely useful.
Re: (Score:1)
Why ignore the impedance mismatch between natural language and math syntax, which is the hardest part of doing math, and which AI seems to have solved?
Re: (Score:2)
"First, the problems were manually translated into formal mathematical language for our systems to understand."
Can I retract my comment?
Re: (Score:2)
As far as I am concerned, yes. Incidentally, for the Math Olympiad, the math is pretty simple so you have a point there about the translation at the very least being a hard step. So they faked that hard step and then claim some great accomplishment. Matches with the deranged claim at the beginning of the article that they are on the way to AGI.
Watching the AI field for now something like 35 years, I have come to the conclusion that many of the researchers in there are prone to hallucinations.
Re:Still a problem of syntax (Score:5, Funny)
Jean-Yves Girard, in the appendix to Locus Solum (2000):
Re: (Score:3)
That is a very nice and very accurate statement. Great find! Also a lot of other interesting stuff in that appendix.
Re: (Score:2)
That you took the time to read TFA carefully enough to spot something that was buried and then took time to post about it is a damn good piece of work.
Re: (Score:2)
You find that translation step is the hardest part of math? What kind of simplistic math have you been doing?
Re: (Score:3)
Bertrand's paradox, as often framed:
"Consider an equilateral triangle inscribed in a circle. Suppose a chord of the circle is chosen at random. What is the probability that the chord is longer than a side of the triangle? "
Want to answer that comprehensively? It's possible. It's just an issue of translation.
Re: (Score:1)
Why didn't you ask this in formal logic, or using a math equation? Is it too hard?
Re: (Score:1)
>Mathematics is a system where semantics has a 1:1 relationship to syntax,
There's a 1:1 mapping if you already have your math language written in the language of a formal proof engine, but there can be ambiguities if you're reading printed notation from a book or pdf.
Re: (Score:2)
Maths not Perfectly Semantic (Score:2)
Mathematics is a system where semantics has a 1:1 relationship to syntax, so maths is actually the single-easiest task.
It's close but not exact e.g. "a.b" and "a x b" have a meaning of '.' and 'x' that change based on whether 'a' and 'b' are vectors or not. "x_\mu" could represent a scalar or a four-vector based on context etc.
Re: (Score:2)
Re: (Score:1)
Once you introduce context sensitivity, can you avoid ambiguity?
Re: (Score:2)
Well, automated formal proof engines are now a niche discipline, because they cannot get to any real depth due to state-space explosion. Emulating them with an LLM will do a lot worse. Proof-checkers are a different story: They are very nice tools, but a human has to take them by the hand and lead them to the result in baby-steps.
Oh, and there are computer algebra systems that could probably have solved some of these tasks 30 years ago. The Math Olympiad is not an useful benchmark.
Re: (Score:2)
The language model is there to interpret the natural language problem statement. It sounds like they've got another model that translates that into Lean (a proof checker) which can then check the steps and fill in missing bits.
So they're using a language model to solve what you seem to think is the hardest step, then using, not replacing, proof engines as an integrated part of their actual math solver.
Re: (Score:1)
As a chatbot enthusiast who was focused on rule-based AI, can I say I'm impressed?
Re: (Score:2)
Yeah, I agree. I was commenting to a friend that the insistence of people running math problems through AI chatbots is silly, but mentioned that using a language model to translate math problems into formal specifications would be pretty useful. Google's gone a step further and noticed that once you've done that, solving the problem is a gigantic but well defined search space and you need to prune down to make practical... just like chess or Go. And they just happened to have a really good algorithm for dea
Re: (Score:2)
If there's no objection, I'd like to quote blue trane, who discovered this gem:
"First, the problems were manually translated into formal mathematical language for our systems to understand."
The implication of this would seem to be that the LLM can't interpret the natural nanguage problem statement.
Re: (Score:3)
Google's blog article does say that, specifically in reference to the olympiad questions. However, if you read further down, they say:
and their system diagram [googleusercontent.com] has a "Formalizer network" that translates informal problems to formal ones.
They may have done the first b
Re: (Score:2)
Mathematics is a system where semantics has a 1:1 relationship to syntax
No, it's many:1. From "Proofs and Types" by Girard, page 2:
Two sentences which have the same sense [sense, syntax, proofs] have the same denotation [denotation, truth, semantics], that is obvious; but two sentences with the same denotation rarely have the same sense. For example, take a complicated mathematical equivalence A B. The two sentences have the same denotation (they are true at the same time) but surely not the same sense, otherwise what is the point of showing the equivalence?
Girard goes on to talk about the algebraic tradition "we quite simply discard the sense [sense, syntax], and consider only the denotation [semantics]". But in the reverse direction (page 3):
... it is impossible to say "forget completely the denotation and concentrate on the sense", for the simple reason that the sense contains the denotation, at least implicitly. So it is not a matter of symmetry.... The only tangible reality about sense is the way it is written, the formalism; but the formalism remains an unaccommodating object of study, without true structure, a piece of *soft camembert*.
This is a good direction for AI (Score:2)
Instead of churning out vast quantities of well written, yet useless text, this direction shows promise
Re: (Score:2)
The referenced story is nonsense (Score:1)
They claim to be on the way to AGI. They are not. Solving math problem is something computer algebra systems could do 40 years ago. Emulating a part of that is not a great achievement.
Re: (Score:1)
Re: (Score:2)
Relevance: none.
Re: (Score:2)
Re: (Score:2)
Nope. Try again. I am not opposed to this research. I am opposed to blatant lies about its state.
Re: (Score:2)
Is the analogy that they're not really aiming for AGI, but they just claim so to get funding?
Re: (Score:2)
Re: (Score:2)
Oh, but we do know. Current generation and technology AI cannot reach AGI, no matter how much improvement it gets. Hence the claim to be on the way to it is simply a blatant lie.
Re: (Score:2)
Re: (Score:2)
Current generation of AI is turning out to be very expensive, is taking much longer than expected to achieve preset milestones, and is full of hyperbole. How long the VCs will continue to fund it is anybody's guess.
Reasoning or brute force? (Score:2)
Gonna have to keep moving those goalposts (Score:2)
"It can't do math!"
Um, well ...
Mathematically proven microprocessors (Score:2)
Its answer to cyber threats is a novel and unique product that combines a secure base processor with a mathematically proven, correct, error-free operating system kernel, which yields a secure IT basis for highest security needs.
. To make "meeting the highest security needs" ubiquitous by putting a processor/OS of that nature in millions of smartphones sounds like something Google would want to be known for doing.
Can it take out the garbage? Get me a beer? (Score:2)
At least Pepper could do that already 10 years ago.
https://www.youtube.com/watch?... [youtube.com]
Yeah? Let's see what it can really do. (Score:1)