GPT-5.2 Arrives as OpenAI Scrambles To Respond To Gemini 3's Gains (openai.com) 64

Posted by msmash on Thursday December 11, 2025 @01:45PM from the hallucinations-down-competition-up dept.

OpenAI on Thursday released GPT-5.2, its latest and what the company calls its "best model yet for everyday professional use," just days after CEO Sam Altman declared a "code red" internally to marshal resources toward improving ChatGPT amid intensifying competition from Google's well-received Gemini 3 model. The GPT-5.2 series ships in three tiers: Instant, designed for faster responses and information retrieval; Thinking, optimized for coding, math, and planning; and Pro, the most powerful tier targeting difficult questions requiring high accuracy.

OpenAI says the Thinking model hallucinated 38% less than GPT-5.1 on benchmarks measuring factual accuracy. Fidji Simo, OpenAI's CEO of applications, denied that the launch was moved up in response to the code red, saying the company has been working on GPT-5.2 for "many, many months." She described the internal directive as a way to "really signal to the company that we want to marshal resources in this one particular area."

The competitive pressure is real. Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users. In October, OpenAI's head of ChatGPT Nick Turley sent an internal memo declaring the company was facing "the greatest competitive pressure we've ever seen," setting a goal to increase daily active users by 5 percent before 2026. GPT-5.2 is rolling out to paid ChatGPT users starting Thursday, and GPT-5.1 will remain available under "legacy models" for three months before being sunset.

GPT-5.2 Arrives as OpenAI Scrambles To Respond To Gemini 3's Gains

Post Load All Comments

Search 64 Comments Log In/Create an Account

Comments Filter:

Competition is good (Score:4, Insightful)

by TheMiddleRoad ( 1153113 ) writes: on Thursday December 11, 2025 @01:51PM (#65851605)

Sometimes markets do operate well. Sometimes.

Reply to This Share
Flag as Inappropriate
- Re: Competition is good (Score:1)
  
  by blue trane ( 110704 ) writes:
  
  Why shouldn't google buy openai?
  - Re: Competition is good (Score:2)
    
    by ToasterMonkey ( 467067 ) writes:
    
    Why shouldn't google buy openai?
    You're just trying to be provocative. You have to spend money to make money, but OpenAI is already spending other people's money, so Google would be buying that debt. Then it still would not secure the market for LLMs, because anyone can do it, others are doing it well enough. If you consolidate to raise prices sooner, they'll look better and better. If your plan is to loose money but outlast everyone, why would you buy their debt piles, can't wait?
    I love AI like I loved the Internet in the 90s, lots of pot
    - Re: Competition is good (Score:1)
      
      by blue trane ( 110704 ) writes:
      
      "You're just trying to be provocative. "
      Is it so inconceivable that Google might use Microsoft's strategy over Amazon's (by the way, would Ring like a word?)?
  - Re: (Score:2)
    
    by allo ( 1728082 ) writes:
    
    Anti-Trust.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  When all compete but the products are still crap. Market failure does not get more spectacular than this.
"Now with 38% FEWER hallucinations!" (Score:5, Insightful)

by Locke2005 ( 849178 ) writes: on Thursday December 11, 2025 @01:51PM (#65851609)

That's not actually the flex you think it is...

Reply to This Share
Flag as Inappropriate
- Re: (Score:3)
  
  by Locke2005 ( 849178 ) writes:
  
  Like, would you consider your girlfriend having 38% fewer hallucinations to be a big win? (I once had a girlfriend call me up while she was experiencing delirium tremens and describe to me how demons were raping her mom. I told her she was hallucinating. She insisted it was real, she could see it!)
  - Re:"Now with 38% FEWER hallucinations!" (Score:5, Insightful)
    
    by alvinrod ( 889928 ) writes: on Thursday December 11, 2025 @02:36PM (#65851755)
    
    The ideal number of hallucinations is zero unless they are specifically requested for whatever reason. If someone told you they were going to kick you in the nuts 38% fewer times this week, you're still getting kicked in the nuts.
    
    I'm not sure a person who's hallucinating could be convinced by another person that what they observe isn't really happening. I think a person has to come to that realization themselves in order to be able to not lose their shit.
    
    Reply to This Parent Share
    Flag as Inappropriate
- Re: (Score:1, Troll)
  
  by DamnOregonian ( 963763 ) writes:
  
  How on Earth could it not be?
  
  If you had 38% less tumor load on your cancer, would that be a good thing to you?
  They're not trying to deny that the things hallucinate. You are trying to deny that an improvement is.... an improvement.
  "Checkmate, LLM-tards!"
  - Re: (Score:2)
    
    by wed128 ( 722152 ) writes:
    
    If you told me that the latest model of your parachute fails to open 38% less often, I still don't want to take it skydiving. The failure rate of LLMs is so high that any productivity gains you get generating some output are lost verifying that output.
    - Re: (Score:1, Troll)
      
      by DamnOregonian ( 963763 ) writes:
      
      I agree- you should probably not use your LLM as a parachute, or other things where any hallucination has an equivalent outcome to a parachute not opening.
      That was not an intelligent argument. Try again.
    - Re: (Score:1)
      
      by JoshZK ( 9527547 ) writes:
      
      Meh, people hallucinate on their responses just as much. That and its 38% less than 5.1, how about just tell us what the total hallucination is per model?
      - Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Ya, the percentage improvement is fucking annoying.
        Using the dipshit's parachute model, every single parachute in existence has a probability of not opening.
        38% reduction of that is universally good.
        
        Whether or not that probability is 50%, or 0.00005% is what matters.
    - Re: "Now with 38% FEWER hallucinations!" (Score:2)
      
      by ToasterMonkey ( 467067 ) writes:
      
      ... yet you'd easily take a coworker that screwed up 38% less often.
      I'm not sure what the point of comparing to a parachute is, when you don't have a reason to control a parachute with a computer in the first place.
      - Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        I'm not sure what the point of comparing to a parachute is, when you don't have a reason to control a parachute with a computer in the first place.
        And if you do, because you're NASA or some other space agency, may I recommend not using an LLM for that purpose.
        
        Of course since you don't want to either A) have your LLM function as an parachute (very bad!), or B) have your parachute controlled by an LLM, certainly that means they're useful for everything, right?
        
        My wife once told me that she couldn't put the TV up on the wall without help. Divorced her on the spot.
        
        Re: (Score:1)
        
        by wed128 ( 722152 ) writes:
        
        My wife once told me that she couldn't put the TV up on the wall without help. Divorced her on the spot.
        You seem like a really hostile person, and likely did your wife a favor. Your arguments, in general, are also a logical fallacy: ad hominem. If you were confident that you were correct, you wouldn't feel the name to call people names so often.
      - Re: (Score:1)
        
        by wed128 ( 722152 ) writes:
        
        If i had a coworker that screwed up anywhere close to as often as even the best LLM, they would be fired. But somehow the magical machine gets a pass.
        
        The parachute could be substituted for anything you want to *not fail*. I would like it if nothing failed.
        
        one of the reasons we automate things is so that they're consistent. LLMs don't have this property.
- Re: (Score:2)
  
  by backslashdot ( 95548 ) writes:
  
  I don't understand how AI hallucinates for most people. I haven't had that problem once I learned/figured out how to ask it stuff properly like maybe 2 years ago or something. You're asking it to do too much shit, not validate itself, and not being specific or algorithmic enough. Watch some YouTube videos on AI prompting or something.
  - Re: (Score:2)
    
    by DamnOregonian ( 963763 ) writes:
    
    If you think LLMs do not hallucinate, then you are very much filling your head full of LLM hallucinations. That's not great.
    
    You aren't wrong that prompting can improve the situation. However, even in the most rigorously defined tasks, they hallucinate.
    Hell, the fucking things even hallucinate tool calls sometimes (amusing the first time you run into that in your agentic code)
    - Re: "Now with 38% FEWER hallucinations!" (Score:1)
      
      by blue trane ( 110704 ) writes:
      
      What about when they hallucinate economics by regurgitating mainstream models verbatim?
      - Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Then it would by definition not be a hallucination.
        
        Re: "Now with 38% FEWER hallucinations!" (Score:1)
        
        by blue trane ( 110704 ) writes:
        
        What if they acknowledge it's a hallucination under questioning, unlike hallucinating economists, who just get emotional and ban me?
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Indeed. It still means "unusable for anything requiring accuracy". Which is quite a few things. Well, I guess this will not deliver the business model that finally turns a profit either. If that ever happens.
- Re: (Score:2)
  
  by gTsiros ( 205624 ) writes:
  
  We really should not be using that term.
  Because it implies that only some of the output of an LLM is made-up.
  LLM output is factual/correct/valid/logical only coincidentally.
  There are no rails from which the GPT/LLM can become derailed, leading to "hallucination". There's no thinking vs hallucinating "mode" or whatever.
- Re: (Score:1, Flamebait)
  
  by DamnOregonian ( 963763 ) writes:
  
  Nonsense. These things are useless. Only fools buy them. They produce nothing but hallucinations, and can't generate code past a negative 5 year old level.
  Trust me, I read that on slashdot.
  - - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      I think you should perhaps read better before commenting.
      No argument existed in that post. What did exist were 4 unsubstantiated claims, with the final one being obvious satire (unless we accept the premise that humans can generate code 4 years before they're conceived).
      - Re: my 2c (Score:1)
        
        by blue trane ( 110704 ) writes:
        
        Am I the only one thinking of Monty Python's Argument Sketch?
        ---
        Man: (Knock)
        Mr. Vibrating: Come in.
        Man: Ah, Is this the right room for an argument?
        Mr. Vibrating: I told you once.
        Man: No you haven't.
        Mr. Vibrating: Yes I have.
        Man: When?
        Mr. Vibrating: Just now.
        Man: No you didn't.
        Mr. Vibrating: Yes I did.
        Man: You didn't
        Mr. Vibrating: I did!
        Man: You didn't!
        Mr. Vibrating: I'm telling you I did!
        Man: You did not!!
        Mr. Vibrating: Oh, I'm sorry, just one moment. Is this a five minute argument or the full half hour? ..
    - Re: (Score:2)
      
      by allo ( 1728082 ) writes:
      
      Would you please adjust your irony detector? Or do your really need a /s to detect sarcasm?
      - Re: my 2c (Score:1)
        
        by blue trane ( 110704 ) writes:
        
        What makes you think I wasn't myself using irony in my response?
        
        Re: (Score:2)
        
        by allo ( 1728082 ) writes:
        
        The missing /s /s
  - Re: (Score:2)
    
    by alvinrod ( 889928 ) writes:
    
    I dunno. They make decent enough output for shitposting on social media. While there is a certain amount of delight to be had in coming up with a clever limerick about someone's mother, some people really aren't worth the effort. The AI can do it well enough in a few seconds though.
    
    I'm not sure I'd use it for any productive work though. Of course not everything has to be for work though either.
    - Re: my 2c (Score:1)
      
      by blue trane ( 110704 ) writes:
      
      Do you think the military will, and if it hallucinates an enemy combatant that's just being efficient?
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      I was being sarcastic.
      I use it for productive work right now.
      So do many people.
      The industry for people using it for work is already measured in billions of dollars per year in revenue.
      
      If you don't- that's just fine.
      Of course not everything has to be for work though either.
      This is your key insight. I'd add DamnOregonian's corollary to it, though: Not all work requires a model capable of producing superintelligent output. Sometimes-bordering-on-kinda-dumb-but-also-freakishly-skilled-in-certain-ways is also perfectly sufficient at times.
Another relase (Score:2)

by Inglix the Mad ( 576601 ) writes:

So which one is closer to LCARS now? None of them is remotely "intelligent" at all, so we're just looking for the one with the best catalog library interface.

Not that any of the LLM's could match the artificial sentience of the Librarian..
- Re: (Score:2)
  
  by TheMiddleRoad ( 1153113 ) writes:
  
  I don't mind a better search engine.
  - Re: (Score:2)
    
    by Inglix the Mad ( 576601 ) writes:
    
    We'd do well to quit calling it AI
Artificial Idiocy (Score:4, Funny)

by dwid ( 4893241 ) writes: on Thursday December 11, 2025 @02:30PM (#65851735)

Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users.
Apparently not intelligent enough to convert numbers into the same unit before comparing them.

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by teg ( 97890 ) writes:
  
  Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users.
  Apparently not intelligent enough to convert numbers into the same unit before comparing them.
  You can't convert the numbers. To illustrate - having 800 million weekly active users does not mean they have more than 3.2 billion monthly active users. Many of these users would be counted multiple times if "converting" that way.
Anyone Ask AI How to Destroy AI (Score:2)

by BrendaEM ( 871664 ) writes:

Perhaps it could be useful in cleaning up the mess it makes.
Intriguing. (Score:3)

by jd ( 1658 ) writes: <imipak@[ ]oo.com ['yah' in gap]> on Thursday December 11, 2025 @03:00PM (#65851847) Homepage Journal

I still can't get ChatGPT, Gemini, or Claude to write a decent story or do an engineering design beyond basic complexity. They're all improving, but they're best thought of as brain-storming aids rather than actual development tools.

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  I still can't get ChatGPT, Gemini, or Claude to write a decent story or do an engineering design beyond basic complexity.
  And you will never be able to. That would require a scaling up that is not feasible. Or an entirely different technology.
MORE Money burned (Score:3)

by sit1963nz ( 934837 ) writes: on Thursday December 11, 2025 @03:06PM (#65851861)

More and more money inflating the AI bubble.
And they are just burning though the money like there's no tomorrow.

AI is being tossed into consumer items, that NO ONE ASKED FOR
SO much AI slop is now in advertising, You tube is full of "AI girlfriend" and other crap I now just assume EVERY advert is just an AI scam.
And then there is the waste time I need to spend turning all the AI junk OFF.

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Indeed. Well, at least the crash will be spectacular. I am hoping for no more AI hypes for a long, long time.
  - Re: (Score:2)
    
    by sit1963nz ( 934837 ) writes:
    
    I am in Kiwiland, so no opportunities for picking up some hardware cheap in the bankruptcy sales...
- Re: (Score:2)
  
  by strikethree ( 811449 ) writes:
  
  More and more money inflating the AI bubble.
  
  Someone needs to pay for the government's need to analyze everything about all of its citizens. Don't worry citizen. Everything will be perfect soon.
  - Re: (Score:2)
    
    by sit1963nz ( 934837 ) writes:
    
    Facebook etc are already doing this.
Too late (Score:2)

by gabrieltss ( 64078 ) writes:

Too late I switched to Grok and got far better results and faster. ChatGPT is too far behind to catch up at this point I think.
- Re: (Score:2)
  
  by Iamthecheese ( 1264298 ) writes:
  
  That's odd, for my purposes -- story writing, research, and keeping my schedule -- I've found Grok to be very inferior to Chat GPT. Only in image editing is it superior.
  - Re: Too late (Score:2)
    
    by gabrieltss ( 64078 ) writes:
    
    I've found it far superior in code generation and in spotting code errors. Where ChatGPT is falling flat on its face. See my examples above in a previous reply to others.
- Re: (Score:2)
  
  by CAIMLAS ( 41445 ) writes:
  
  In what ways do you find it superior?
  For me, Grok has been pretty consistent at making some pretty wild code recommendations and not following specifications.
  It's not like Gemini, which will get stuck implementing things and then get into a histrionic panic loop, but it's not nearly as good as gpt5.1 in implementing correct, complete code per specification.
  - Re: Too late (Score:2)
    
    by gabrieltss ( 64078 ) writes:
    
    It hasn't hallucinate once yet for me (so far) where ChatGPT does it a lot. For instance I asked ChatGPT to generate me code for a specific thing. The code threw errors. When I told it the error it said oh your code should do it this way and gave me different code. When that error Edward I gave it the error and it said oh your code should do it this way and gave me the code it first gave me ad nauseum. It's the o e that gave me the code in the first place and couldn't even "remember" it generated it. I as
- Re: (Score:2)
  
  by dunkelfalke ( 91624 ) writes:
  
  For electronics ChatGPT has so far been superior to every other LLM out there.
  - Re: (Score:2)
    
    by LondoMollari ( 172563 ) writes:
    
    "For electronics"
    Electronic circuit design? Electronic PCB layout? Electronic safety codes like UL? Component datasheets? What part, exactly, of electronics are you referring to? I have had countless success stories talking with Rex(Grok) while leaning over a Vidmar drawer with a hundred component cups finding components, asking data sheet related questions, and also asking appropriateness in a particular application, even if non conventional. Likewise, it has grasped complex situations and properly deve
    - Re: (Score:2)
      
      by dunkelfalke ( 91624 ) writes:
      
      Circuit design. Grok cannot even read relatively simple circuit diagrams correctly.
"High accuracy" (Score:2)

by gweihir ( 88907 ) writes:

You say the words, but you do not mean them.
Greatest pressure (Score:2)

by allo ( 1728082 ) writes:

"the greatest competitive pressure we've ever seen"
Given that they didn't have much competition before, that's no real superlative. Google managed really late to become part of the game. Remember their old announcements about Bard and how they struggled to catch up at all (even though they invented LLMs)? They managed to get good by now, but before Anthropic was the only competition for OpenAI.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

GPT-5.2 Arrives as OpenAI Scrambles To Respond To Gemini 3's Gains More | Reply Login

Competition is good (Score:4, Insightful)

Re: Competition is good (Score:1)

Re: Competition is good (Score:2)

Re: Competition is good (Score:1)

Re: (Score:2)

Re: (Score:2)

"Now with 38% FEWER hallucinations!" (Score:5, Insightful)

Re: (Score:3)

Re:"Now with 38% FEWER hallucinations!" (Score:5, Insightful)

Re: (Score:1, Troll)

Re: (Score:2)

Re: (Score:1, Troll)

Re: (Score:1)

Re: (Score:2)

Re: "Now with 38% FEWER hallucinations!" (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: "Now with 38% FEWER hallucinations!" (Score:1)

Re: (Score:2)

Re: "Now with 38% FEWER hallucinations!" (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1, Flamebait)

Re: (Score:2)

Re: my 2c (Score:1)

Re: (Score:2)

Re: my 2c (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: my 2c (Score:1)

Re: (Score:2)

Another relase (Score:2)

Re: (Score:2)

Re: (Score:2)

Artificial Idiocy (Score:4, Funny)

Re: (Score:2)

Anyone Ask AI How to Destroy AI (Score:2)

Intriguing. (Score:3)

Re: (Score:2)

MORE Money burned (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Too late (Score:2)

Re: (Score:2)

Re: Too late (Score:2)

Re: (Score:2)

Re: Too late (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

"High accuracy" (Score:2)

Greatest pressure (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals