Google DeepMind's AI Systems Can Now Solve Complex Math Problems (technologyreview.com) 40

Posted by msmash on Thursday July 25, 2024 @11:54AM from the pushing-the-limits dept.

Google DeepMind has announced that its AI systems, AlphaProof and AlphaGeometry 2, have achieved silver medal performance at the 2024 International Mathematical Olympiad (IMO), solving four out of six problems and scoring 28 out of 42 possible points in a significant breakthrough for AI in mathematical reasoning. This marks the first time an AI system has reached such a high level of performance in this prestigious competition, which has long been considered a benchmark for advanced mathematical reasoning capabilities in machine learning.

AlphaProof, a system that combines a pre-trained language model with reinforcement learning techniques, demonstrated its new capability by solving two algebra problems and one number theory problem, including the competition's most challenging question. Meanwhile, AlphaGeometry 2 successfully tackled a complex geometry problem, Google wrote in a blog post. The systems' solutions were formally verified and scored by prominent mathematicians, including Fields Medal winner Prof Sir Timothy Gowers and IMO Problem Selection Committee Chair Dr Joseph Myers, lending credibility to the achievement.

The development of these AI systems represents a significant step forward in bridging the gap between natural language processing and formal mathematical reasoning, the company argued. By fine-tuning a version of Google's Gemini model to translate natural language problem statements into formal mathematical language, the researchers created a vast library of formalized problems, enabling AlphaProof to train on millions of mathematical challenges across various difficulty levels and topic areas. While the systems' performance is impressive, challenges remain, particularly in the field of combinatorics where both AI models were unable to solve the given problems. Researchers at Google DeepMind continue to investigate these limitations, the company said, aiming to further improve the systems' capabilities across all areas of mathematics.

Google DeepMind's AI Systems Can Now Solve Complex Math Problems

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 40 Comments Log In/Create an Account

Comments Filter:

Still a problem of syntax (Score:2, Insightful)

by jd ( 1658 ) writes:

Mathematics is a system where semantics has a 1:1 relationship to syntax, so maths is actually the single-easiest task.
Formal proof engines (such as HOL and COQ) have existed for a long time, reproducing a small fraction of their capability by means of neural nets is certainly ingenius and valuable, but it's only going to be useful when you can reproduce everything a mathematical modelling package can do.
And it still won't be intelligent, merely useful.
- Re: (Score:1)
  
  by blue trane ( 110704 ) writes:
  
  Why ignore the impedance mismatch between natural language and math syntax, which is the hardest part of doing math, and which AI seems to have solved?
  - Re: (Score:2)
    
    by blue trane ( 110704 ) writes:
    
    "First, the problems were manually translated into formal mathematical language for our systems to understand."
    Can I retract my comment?
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      As far as I am concerned, yes. Incidentally, for the Math Olympiad, the math is pretty simple so you have a point there about the translation at the very least being a hard step. So they faked that hard step and then claim some great accomplishment. Matches with the deranged claim at the beginning of the article that they are on the way to AGI.
      Watching the AI field for now something like 35 years, I have come to the conclusion that many of the researchers in there are prone to hallucinations.
      - Re:Still a problem of syntax (Score:5, Funny)
        
        by pjt33 ( 739471 ) writes: on Thursday July 25, 2024 @12:53PM (#64655126)
        
        Jean-Yves Girard, in the appendix to Locus Solum (2000):
        To the best of my knowledge, this is the only scientific area with an intrinsic conflict of
        interest: unlike medical researchers who are usually in good health, these AI guys badly
        need the stuff they are after.
        
        
        Re: (Score:3)
        
        by gweihir ( 88907 ) writes:
        
        That is a very nice and very accurate statement. Great find! Also a lot of other interesting stuff in that appendix.
    - Re: (Score:2)
      
      by jd ( 1658 ) writes:
      
      That you took the time to read TFA carefully enough to spot something that was buried and then took time to post about it is a damn good piece of work.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    You find that translation step is the hardest part of math? What kind of simplistic math have you been doing?
    - Re: (Score:3)
      
      by gardyloo ( 512791 ) writes:
      
      Bertrand's paradox, as often framed:
      "Consider an equilateral triangle inscribed in a circle. Suppose a chord of the circle is chosen at random. What is the probability that the chord is longer than a side of the triangle? "
      Want to answer that comprehensively? It's possible. It's just an issue of translation.
    - Re: (Score:1)
      
      by blue trane ( 110704 ) writes:
      
      Why didn't you ask this in formal logic, or using a math equation? Is it too hard?
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  >Mathematics is a system where semantics has a 1:1 relationship to syntax,
  There's a 1:1 mapping if you already have your math language written in the language of a formal proof engine, but there can be ambiguities if you're reading printed notation from a book or pdf.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Maths not Perfectly Semantic (Score:2)
  
  by Roger W Moore ( 538166 ) writes:
  
  Mathematics is a system where semantics has a 1:1 relationship to syntax, so maths is actually the single-easiest task.
  It's close but not exact e.g. "a.b" and "a x b" have a meaning of '.' and 'x' that change based on whether 'a' and 'b' are vectors or not. "x_\mu" could represent a scalar or a four-vector based on context etc.
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - Re: (Score:1)
      
      by blue trane ( 110704 ) writes:
      
      Once you introduce context sensitivity, can you avoid ambiguity?
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Well, automated formal proof engines are now a niche discipline, because they cannot get to any real depth due to state-space explosion. Emulating them with an LLM will do a lot worse. Proof-checkers are a different story: They are very nice tools, but a human has to take them by the hand and lead them to the result in baby-steps.
  Oh, and there are computer algebra systems that could probably have solved some of these tasks 30 years ago. The Math Olympiad is not an useful benchmark.
- Re: (Score:2)
  
  by ceoyoyo ( 59147 ) writes:
  
  The language model is there to interpret the natural language problem statement. It sounds like they've got another model that translates that into Lean (a proof checker) which can then check the steps and fill in missing bits.
  So they're using a language model to solve what you seem to think is the hardest step, then using, not replacing, proof engines as an integrated part of their actual math solver.
  - Re: (Score:1)
    
    by blue trane ( 110704 ) writes:
    
    As a chatbot enthusiast who was focused on rule-based AI, can I say I'm impressed?
    - Re: (Score:2)
      
      by ceoyoyo ( 59147 ) writes:
      
      Yeah, I agree. I was commenting to a friend that the insistence of people running math problems through AI chatbots is silly, but mentioned that using a language model to translate math problems into formal specifications would be pretty useful. Google's gone a step further and noticed that once you've done that, solving the problem is a gigantic but well defined search space and you need to prune down to make practical... just like chess or Go. And they just happened to have a really good algorithm for dea
  - Re: (Score:2)
    
    by jd ( 1658 ) writes:
    
    If there's no objection, I'd like to quote blue trane, who discovered this gem:
    "First, the problems were manually translated into formal mathematical language for our systems to understand."
    The implication of this would seem to be that the LLM can't interpret the natural nanguage problem statement.
    - Re: (Score:3)
      
      by ceoyoyo ( 59147 ) writes:
      
      Google's blog article does say that, specifically in reference to the olympiad questions. However, if you read further down, they say:
      We established a bridge between these two complementary spheres by fine-tuning a Gemini model to automatically translate natural language problem statements into formal statements, creating a large library of formal problems of varying difficulty.
      and their system diagram [googleusercontent.com] has a "Formalizer network" that translates informal problems to formal ones.
      They may have done the first b
- Re: (Score:2)
  
  by ljw1004 ( 764174 ) writes:
  
  Mathematics is a system where semantics has a 1:1 relationship to syntax
  No, it's many:1. From "Proofs and Types" by Girard, page 2:
  Two sentences which have the same sense [sense, syntax, proofs] have the same denotation [denotation, truth, semantics], that is obvious; but two sentences with the same denotation rarely have the same sense. For example, take a complicated mathematical equivalence A B. The two sentences have the same denotation (they are true at the same time) but surely not the same sense, otherwise what is the point of showing the equivalence?
  Girard goes on to talk about the algebraic tradition "we quite simply discard the sense [sense, syntax], and consider only the denotation [semantics]". But in the reverse direction (page 3):
  
  ... it is impossible to say "forget completely the denotation and concentrate on the sense", for the simple reason that the sense contains the denotation, at least implicitly. So it is not a matter of symmetry.... The only tangible reality about sense is the way it is written, the formalism; but the formalism remains an unaccommodating object of study, without true structure, a piece of *soft camembert*.
This is a good direction for AI (Score:2)

by MpVpRb ( 1423381 ) writes:

Instead of churning out vast quantities of well written, yet useless text, this direction shows promise
- Re: (Score:2)
  
  by Seven Spirals ( 4924941 ) writes:
  
  Well, this is Gemini. Apparently, when it's not too busy being an anti-white racist [vox.com], it can do something helpful. However, it'd be much more impressive if it solved some materials science problems like "What are the best materials for a solid state battery?" or "How can we make carbon fiber cheaper?" etc.... So, this still seems like baby steps.
The referenced story is nonsense (Score:1)

by gweihir ( 88907 ) writes:

They claim to be on the way to AGI. They are not. Solving math problem is something computer algebra systems could do 40 years ago. Emulating a part of that is not a great achievement.
- Re: (Score:1)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Relevance: none.
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Nope. Try again. I am not opposed to this research. I am opposed to blatant lies about its state.
  - Re: (Score:2)
    
    by pjt33 ( 739471 ) writes:
    
    Is the analogy that they're not really aiming for AGI, but they just claim so to get funding?
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Oh, but we do know. Current generation and technology AI cannot reach AGI, no matter how much improvement it gets. Hence the claim to be on the way to it is simply a blatant lie.
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
      - Re: (Score:2)
        
        by martin-boundary ( 547041 ) writes:
        
        That's not how research works. In research, there are people who estimate the importance of the work, the chance of success, the length of the project and the funding required. Then a decision gets made.
        Current generation of AI is turning out to be very expensive, is taking much longer than expected to achieve preset milestones, and is full of hyperbole. How long the VCs will continue to fund it is anybody's guess.
Reasoning or brute force? (Score:2)

by leptons ( 891340 ) writes:

I doubt there is much actual "reasoning" going on. If it were capable of reasoning then the scores likely wouldn't be so low. Brute-force is much more likely, being the cornerstone that the current AI craze is built on.
Gonna have to keep moving those goalposts (Score:2)

by cascadingstylesheet ( 140919 ) writes:

"It can't do math!"
Um, well ...
Mathematically proven microprocessors (Score:2)

by John.Banister ( 1291556 ) * writes:

I remember stories a few years back about people trying to set up microprocessors with a mathematically proven instruction set so that there could be no surprise zero day flaws encountered later on. When I search for that now, I find Hensoldt [hensoldt.net], who says
Its answer to cyber threats is a novel and unique product that combines a secure base processor with a mathematically proven, correct, error-free operating system kernel, which yields a secure IT basis for highest security needs.
. To make "meeting the highest security needs" ubiquitous by putting a processor/OS of that nature in millions of smartphones sounds like something Google would want to be known for doing.
Can it take out the garbage? Get me a beer? (Score:2)

by thesjaakspoiler ( 4782965 ) writes:

At least Pepper could do that already 10 years ago.
https://www.youtube.com/watch?... [youtube.com]
Yeah? Let's see what it can really do. (Score:1)

by jonadab ( 583620 ) writes:

Ask it about the non-trivial zeroes of the zeta function. Let's see what it comes up with. Maybe the output will at least be amusing.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Still a problem of syntax (Score:2, Insightful)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:Still a problem of syntax (Score:5, Funny)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Maths not Perfectly Semantic (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

This is a good direction for AI (Score:2)

Re: (Score:2)

The referenced story is nonsense (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Reasoning or brute force? (Score:2)

Gonna have to keep moving those goalposts (Score:2)

Mathematically proven microprocessors (Score:2)

Can it take out the garbage? Get me a beer? (Score:2)

Yeah? Let's see what it can really do. (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals