OpenAI's 'Embarrassing' Math (techcrunch.com) 41

Posted by msmash on Monday October 20, 2025 @02:45PM from the oops dept.

An anonymous reader writes: "Hoisted by their own GPTards." That's how Meta's Chief AI Scientist Yann LeCun described the blowback after OpenAI researchers did a victory lap over GPT-5's supposed math breakthroughs. Google DeepMind CEO Demis Hassabis added, "this is embarrassing." The Decoder reports that in a since-deleted tweet, OpenAI VP Kevin Weil declared that "GPT-5 found solutions to 10 (!) previously unsolved Erdos problems and made progress on 11 others." ("Erdos problems" are famous conjectures posed by mathematician Paul Erdos.)

However, mathematician Thomas Bloom, who maintains the Erdos Problems website, said Weil's post was "a dramatic misrepresentation" -- while these problems were indeed listed as "open" on Bloom's website, he said that only means, "I personally am unaware of a paper which solves it." In other words, it's not accurate to claim GPT-5 was able to solve previously unsolved problems. Instead, Bloom wrote, "GPT-5 found references, which solved these problems, that I personally was unaware of."

OpenAI's 'Embarrassing' Math

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 41 Comments Log In/Create an Account

Comments Filter:

"GPT-5 found refs, which solved these problems" (Score:5, Interesting)

by test321 ( 8891681 ) writes: on Monday October 20, 2025 @02:57PM (#65738862)

As a language models, GPT-5 finding references that solved Erdos problems that the professional mathematician and maintainer of the Erdos problem website was not aware of, shows an actual useful skill for the LLM.

- Re: (Score:2)
  
  by drjzzz ( 150299 ) writes:
  
  Agree. It's like saying "I could've known that". Yeah, but you didn't. Even if they are not at the level of Erdos proofs they might have seemed to claim, GPT claims are not "embarrassing", except perhaps to competitors.
  - Re:"GPT-5 found refs, which solved these problems" (Score:4, Insightful)
    
    by grantham ( 49250 ) writes: on Monday October 20, 2025 @03:20PM (#65738908) Homepage
    
    It's embarrassing that they claimed these problems were "previously unsolved".
    
    - Re:"GPT-5 found refs, which solved these problems" (Score:4, Insightful)
      
      by jsonn ( 792303 ) writes: on Monday October 20, 2025 @03:35PM (#65738936)
      
      The part that is embarrassing is that they didn't check their sources.
      
  - Re:"GPT-5 found refs, which solved these problems" (Score:5, Insightful)
    
    by jsonn ( 792303 ) writes: on Monday October 20, 2025 @03:23PM (#65738910)
    
    OpenAI's tweet was worded to suggest that GPT-5 found something new, when it "just" did a literature (or Google for Gen Z) search. If pulled that kind of stunt in a PHD thesis, it would be revoked post haste.
    
    - Re: (Score:3)
      
      by larryjoe ( 135075 ) writes:
      
      OpenAI's tweet was worded to suggest that GPT-5 found something new, when it "just" did a literature (or Google for Gen Z) search. If pulled that kind of stunt in a PHD thesis, it would be revoked post haste.
      Is the criticism that OpenAI is guilty of plagiarism, that they are claiming that they found solutions on their own? It seems more like the issue is that OpenAI thought they were clear in claiming to have found already existing solutions that were not widely acknowledged but others are interpreting the claims as having actually found new solutions.
      Both OpenAI and Meta/Google have their own motivations for claim what they are claiming, but it seems like this is not necessarily a clear slam dunk for either s
      - Re: (Score:2, Insightful)
        
        by jsonn ( 792303 ) writes:
        
        "GPT-5 found solutions to 10 previously unsolved Erdös problems" (quoted from the Twitter post) is IMO pretty clear wording...
        
        Re: "GPT-5 found refs, which solved these problems (Score:2)
        
        by AnonymousNoel ( 6972222 ) writes:
        
        Found = Located by searching.
        
        You could read it as meaning "discovered something no one else has discovered before", but that's adding an extra layer of interpretation on top of the literal reading.
        
        Re: (Score:2)
        
        by jsonn ( 792303 ) writes:
        
        The literal expression is used by actual mathematicians that discover solutions, they "found them".
        
        Re: (Score:2)
        
        by AnonymousNoel ( 6972222 ) writes:
        
        I'm aware of that. However, I'm not sure this communication was either written for, nor aimed at mathematicians.
  - Re:"GPT-5 found refs, which solved these problems" (Score:4, Insightful)
    
    by dfghjk ( 711126 ) writes: on Monday October 20, 2025 @03:35PM (#65738938)
    
    "... GPT claims are not "embarrassing", except perhaps to competitors."
    Yes they are, the claims were utterly false.
    
    - Re: (Score:3)
      
      by HiThere ( 15173 ) writes:
      
      They weren't "utterly false", only "substantially false".
      I.e., the problems had been solved. Just not by ChatGPT.
  - Re: (Score:2)
    
    by tomkost ( 944194 ) writes:
    
    Reminds me of a time I was at the Houston Art Museum. We're looking at a Jackson Pollack painting (abstract paint splatters). My friend remarked "that's awful, that's so easy, anyone can do that, I could do that!!" Then the lady just behind us says to him. "Yes, you COULD have done, but you DIDN'T, and that's whole point here" We laughed so hard...
    - splat (Score:4, Funny)
      
      by hawk ( 1151 ) writes: <hawk@eyry.org> on Monday October 20, 2025 @04:28PM (#65739054) Journal
      
      I've been there.
      I was so frustrated that I taped my banana to the wall!
      
    - Re: (Score:3)
      
      by HiThere ( 15173 ) writes:
      
      Actually Jackson Pollack paintings had (have?) hidden meaning. I don't think anyone knows what that meaning is, but multiple of his (authentic) paintings have been analyzed and there are strong patterns in the way he used colors and the angles at which the lines cross. Presumably that are other patterns that weren't checked for. It's like a coded message that nobody has the key for. But using this you can easily detect fake Pollack paintings.
      So, no, you couldn't do an equivalent painting. Just one that
      - Re: (Score:2)
        
        by ZorroXXX ( 610877 ) writes:
        
        What you wrote reminded me of Barnett Newman painting Who’s afraid of red, yellow, and blue which definitely is not my style, and younger me would most likely written off as more or less nonsense, but watching the video Who’s afraid of modern art: Vandalism, video games, and fascism [youtube.com] I learned that the colours used are actually hard to reproduce [youtu.be] and requite some skill I was unaware of.
        Still not my style, but I can now appreciate that other people might have it as their.
- not nearly in the same realm (Score:2, Interesting)
  
  by Anonymous Coward writes:
  
  One is asserting that GPT-5 can do theoretical math at a level above practicing mathematicians. Other is saying that if you spend a hundred billion dollars digesting all of human knowledge you can build a better search engine.
  A better search engine is not going to do anything of the things AI zealots are promising. We already know that ChatGPT is pretty good as a search engine. But that won't replace human labor; it will just help highly skilled workers who are able to precisely formulate the questions they
- Re:"GPT-5 found refs, which solved these problems" (Score:4, Interesting)
  
  by jsonn ( 792303 ) writes: on Monday October 20, 2025 @03:33PM (#65738930)
  
  It's useful, but not groundbreaking. The equivalent to a minimal spanning tree in directed graphs is called an optimal branching. The algorithm for that was originally known Edmonds' algorithm in most of the world after the discoverer. It was later discovered that Chu and Liu already published essentially the same idea two years earlier in a Chinese journal. Even now, it is still often referenced only by Edmonds' name. This happens often enough.
  
  - Re:"GPT-5 found refs, which solved these problems" (Score:4, Interesting)
    
    by dfghjk ( 711126 ) writes: on Monday October 20, 2025 @03:43PM (#65738960)
    
    I invented the Bresenham line algorithm, a while later I learned I was not the first. I'm sure that's been replayed a million times.
    Gallo discovered HTLV-3, the virus that causes AIDS. He discovered it by stealing the virus from French researchers who isolated it first and made it available in good faith. So, you know, the details are important. Here, promoters of AI made claims that are utterly untrue, it would be interesting to know if they made these errors out of deceit or ignorance.
    
    - Re:"GPT-5 found refs, which solved these problems" (Score:4, Informative)
      
      by jsonn ( 792303 ) writes: on Monday October 20, 2025 @03:48PM (#65738974)
      
      I absolutely agree. It's a major difference to find a publication somewhere that was unknown and discovering something independently. It's useful to be able to do the former, but vastly different from being able to do the latter. Not verifying what the computer did is the real embarrassment for OpenAI.
      
      - Re: (Score:2)
        
        by HiThere ( 15173 ) writes:
        
        I suspect that the people who had the AI "solve" the problem were aware of what it had done, but that the PR guys got (or understood) a simplified story. No actual lies were necessarily involved, as there wasn't necessarily any intent to deceive.
        OTOH, the statement was substantially false.
        When corporations tell self-promoting falsehoods its not necessarily with any intent to deceive. But you still shouldn't trust them when there's not sufficient transparency. And were this a fraud investigation, I think
- Re: (Score:2)
  
  by dfghjk ( 711126 ) writes:
  
  my thoughts exactly, and also proof that GPT-5 may actually be more intelligent that the a-holes that created it and promote it.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  For very limited values of "useful". If these references were actually really useful, the maintainer would have known them.
- Re: (Score:2)
  
  by N1AK ( 864906 ) writes:
  
  While true, that's not the same thing as solving unsolved problems which is what OpenAI claimed it had done. I don't know how to do a lot of things, but that doesn't mean Google is solving an unsolved problem if I use Google search to find the answer.
LLMs are a red herring (Score:2)

by Quakeulf ( 2650167 ) writes:

Physics will always win.
A better search engine, or... (Score:2)

by TheMiddleRoad ( 1153113 ) writes:

That he didn't bother to search, or...
The volume of garbage that AI spews out has made traditional search engines less useful, or...
The people who actually solved the problems did not bother to contact the guy who tracks them.
Quite possibly all these "or" should be replaces with "and".
- Re:A better search engine, or... (Score:4, Insightful)
  
  by HiThere ( 15173 ) writes: <charleshixsn@@@earthlink...net> on Monday October 20, 2025 @06:40PM (#65739346)
  
  Papers published in small journals have been overlooked since before the time of Gauss (who overlooked Lobachevsky).
  
Great Name! (Score:1)

by TwistedGreen ( 80055 ) writes:

GPTards, I love it! I'll be using that name to refer to all ChatGPT users going forward.
- Re:Stone Tossers. (Score:5, Insightful)
  
  by ledow ( 319597 ) writes: on Monday October 20, 2025 @06:48PM (#65739360) Homepage
  
  It's a website run mainly by one guy....
  "This website was made by Thomas Bloom, a mathematician who likes to think about the problems ErdÅ's posed."
  It's not like an official website endorsed by the author or anything. It's just a website of a guy who has some interest in that particular set of problems.
  "Is the database up to date (e.g. the open/solved status of each problem)?
  No, but that is the eventual goal"
  https://www.erdosproblems.com/... [erdosproblems.com]
  It's like holding some trainspotter responsible for not knowing about the existence of a particular niche train somewhere in the world.
  
Er ... (Score:2)

by cascadingstylesheet ( 140919 ) writes:

... does sound kinda useful?
Brilliant Scientist Presents His Creation (Score:2)

by newbie_fantod ( 514871 ) writes:

From Young Frankenstein:
https://www.youtube.com/watch?... [youtube.com]
What's unfortunate here (Score:4, Insightful)

by JoshuaZ ( 1134087 ) writes: on Monday October 20, 2025 @05:11PM (#65739152) Homepage

What's really unfortunate here is that due to OpenAI's drastic exaggeration of what happened here it distracts from the real capabilities here. Being able to efficiently find sources in the literature is an incredibly useful tool. And even aside from that there are now multiple examples where professional mathematicians have used GPT-5 in the thinking mode to make progress on math problems. Nothing as major as any Erdos problem, but still clear use. Terry Tao for example used GPT-5 in thinking mode to help locate a counterexample to a conjecture here https://mathoverflow.net/questions/501066/is-the-least-common-multiple-sequence-textlcm1-2-dots-n-a-subset-of-t [mathoverflow.net]. Now, he could have almost certainly done this on his own, but it clearly saved time. Similarly, computer scientist Scott Aaronson used it to get a specific useful suggestion for a function with specific properties he needed that he was then able to use to do a specific thing https://scottaaronson.blog/?p=9183 [scottaaronson.blog]. In neither of these cases was anything deeply groundbreaking done by the LLM. But the LLM clearly helped and likely saved many hours of work otherwise. And these systems continue to improve.

it's not doing math (Score:3)

by drinkypoo ( 153816 ) writes: <drink@hyperlogos.org> on Monday October 20, 2025 @05:30PM (#65739198) Homepage Journal

It's hallucinating solutions

- Re: (Score:2)
  
  by Ksevio ( 865461 ) writes:
  
  No, it's not hallucinating, it's showing math from the reference data
LLM peddlers lying. What else is new? (Score:2)

by gweihir ( 88907 ) writes:

If these artificial cretins could perform (outside of very limited circumstances), the worlds would already look a lot different. Instead it has all been smoke and mirrors and lies by misdirection. Sure, a lot of people fall for that but that does not say anything about LLMs, it says some not very good things about people.
Timeline and Summary (Score:1)

by HaydonBerrow ( 5120473 ) writes:

There is an excellent timeline and summary on Reddit/math https://www.reddit.com/r/math/... [reddit.com]
If you want to pick nits... (Score:2)

by kwelch007 ( 197081 ) writes:

RTFM - Per the article, the OpenAI Tweet actually said, "“GPT-5 found solutions to 10 (!) previously unsolved Erds problems and made progress on 11 others.”
The article doesn't speak to the "made progress on 11 others part," but by all accounts (in the article) GPT-5 did in fact "find" solutions to 10 problems. The Tweet didn't claim that OpenAI solved those 10 problems.
Words matter.
It seems to me like the OpenAI engineer was apologizing for the wording of the Tweet since it could be interpreted

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

"GPT-5 found refs, which solved these problems" (Score:5, Interesting)

Re: (Score:2)

Re:"GPT-5 found refs, which solved these problems" (Score:4, Insightful)

Re:"GPT-5 found refs, which solved these problems" (Score:4, Insightful)

Re:"GPT-5 found refs, which solved these problems" (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2, Insightful)

Re: "GPT-5 found refs, which solved these problems (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:"GPT-5 found refs, which solved these problems" (Score:4, Insightful)

Re: (Score:3)

Re: (Score:2)

splat (Score:4, Funny)

Re: (Score:3)

Re: (Score:2)

not nearly in the same realm (Score:2, Interesting)

Re:"GPT-5 found refs, which solved these problems" (Score:4, Interesting)

Re:"GPT-5 found refs, which solved these problems" (Score:4, Interesting)

Re:"GPT-5 found refs, which solved these problems" (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

LLMs are a red herring (Score:2)

A better search engine, or... (Score:2)

Re:A better search engine, or... (Score:4, Insightful)

Great Name! (Score:1)

Re:Stone Tossers. (Score:5, Insightful)

Er ... (Score:2)

Brilliant Scientist Presents His Creation (Score:2)

What's unfortunate here (Score:4, Insightful)

it's not doing math (Score:3)

Re: (Score:2)

LLM peddlers lying. What else is new? (Score:2)

Timeline and Summary (Score:1)

If you want to pick nits... (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals