Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
AI

GPT-5.2 Arrives as OpenAI Scrambles To Respond To Gemini 3's Gains (openai.com) 64

OpenAI on Thursday released GPT-5.2, its latest and what the company calls its "best model yet for everyday professional use," just days after CEO Sam Altman declared a "code red" internally to marshal resources toward improving ChatGPT amid intensifying competition from Google's well-received Gemini 3 model. The GPT-5.2 series ships in three tiers: Instant, designed for faster responses and information retrieval; Thinking, optimized for coding, math, and planning; and Pro, the most powerful tier targeting difficult questions requiring high accuracy.

OpenAI says the Thinking model hallucinated 38% less than GPT-5.1 on benchmarks measuring factual accuracy. Fidji Simo, OpenAI's CEO of applications, denied that the launch was moved up in response to the code red, saying the company has been working on GPT-5.2 for "many, many months." She described the internal directive as a way to "really signal to the company that we want to marshal resources in this one particular area."

The competitive pressure is real. Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users. In October, OpenAI's head of ChatGPT Nick Turley sent an internal memo declaring the company was facing "the greatest competitive pressure we've ever seen," setting a goal to increase daily active users by 5 percent before 2026. GPT-5.2 is rolling out to paid ChatGPT users starting Thursday, and GPT-5.1 will remain available under "legacy models" for three months before being sunset.

GPT-5.2 Arrives as OpenAI Scrambles To Respond To Gemini 3's Gains

Comments Filter:
  • by TheMiddleRoad ( 1153113 ) on Thursday December 11, 2025 @01:51PM (#65851605)

    Sometimes markets do operate well. Sometimes.

    • Why shouldn't google buy openai?

      • Why shouldn't google buy openai?

        You're just trying to be provocative. You have to spend money to make money, but OpenAI is already spending other people's money, so Google would be buying that debt. Then it still would not secure the market for LLMs, because anyone can do it, others are doing it well enough. If you consolidate to raise prices sooner, they'll look better and better. If your plan is to loose money but outlast everyone, why would you buy their debt piles, can't wait?

        I love AI like I loved the Internet in the 90s, lots of pot

      • by allo ( 1728082 )

        Anti-Trust.

    • by gweihir ( 88907 )

      When all compete but the products are still crap. Market failure does not get more spectacular than this.

  • by Locke2005 ( 849178 ) on Thursday December 11, 2025 @01:51PM (#65851609)
    That's not actually the flex you think it is...
    • Like, would you consider your girlfriend having 38% fewer hallucinations to be a big win? (I once had a girlfriend call me up while she was experiencing delirium tremens and describe to me how demons were raping her mom. I told her she was hallucinating. She insisted it was real, she could see it!)
      • by alvinrod ( 889928 ) on Thursday December 11, 2025 @02:36PM (#65851755)
        The ideal number of hallucinations is zero unless they are specifically requested for whatever reason. If someone told you they were going to kick you in the nuts 38% fewer times this week, you're still getting kicked in the nuts.

        I'm not sure a person who's hallucinating could be convinced by another person that what they observe isn't really happening. I think a person has to come to that realization themselves in order to be able to not lose their shit.
    • Re: (Score:1, Troll)

      How on Earth could it not be?

      If you had 38% less tumor load on your cancer, would that be a good thing to you?
      They're not trying to deny that the things hallucinate. You are trying to deny that an improvement is.... an improvement.
      "Checkmate, LLM-tards!"
      • by wed128 ( 722152 )
        If you told me that the latest model of your parachute fails to open 38% less often, I still don't want to take it skydiving. The failure rate of LLMs is so high that any productivity gains you get generating some output are lost verifying that output.
        • Re: (Score:1, Troll)

          I agree- you should probably not use your LLM as a parachute, or other things where any hallucination has an equivalent outcome to a parachute not opening.
          That was not an intelligent argument. Try again.
        • by JoshZK ( 9527547 )
          Meh, people hallucinate on their responses just as much. That and its 38% less than 5.1, how about just tell us what the total hallucination is per model?
          • Ya, the percentage improvement is fucking annoying.
            Using the dipshit's parachute model, every single parachute in existence has a probability of not opening.
            38% reduction of that is universally good.

            Whether or not that probability is 50%, or 0.00005% is what matters.
        • ... yet you'd easily take a coworker that screwed up 38% less often.

          I'm not sure what the point of comparing to a parachute is, when you don't have a reason to control a parachute with a computer in the first place.

          • I'm not sure what the point of comparing to a parachute is, when you don't have a reason to control a parachute with a computer in the first place.

            And if you do, because you're NASA or some other space agency, may I recommend not using an LLM for that purpose.

            Of course since you don't want to either A) have your LLM function as an parachute (very bad!), or B) have your parachute controlled by an LLM, certainly that means they're useful for everything, right?

            My wife once told me that she couldn't put the TV up on the wall without help. Divorced her on the spot.

            • by wed128 ( 722152 )

              My wife once told me that she couldn't put the TV up on the wall without help. Divorced her on the spot.

              You seem like a really hostile person, and likely did your wife a favor. Your arguments, in general, are also a logical fallacy: ad hominem. If you were confident that you were correct, you wouldn't feel the name to call people names so often.

          • by wed128 ( 722152 )
            If i had a coworker that screwed up anywhere close to as often as even the best LLM, they would be fired. But somehow the magical machine gets a pass.

            The parachute could be substituted for anything you want to *not fail*. I would like it if nothing failed.

            one of the reasons we automate things is so that they're consistent. LLMs don't have this property.
    • I don't understand how AI hallucinates for most people. I haven't had that problem once I learned/figured out how to ask it stuff properly like maybe 2 years ago or something. You're asking it to do too much shit, not validate itself, and not being specific or algorithmic enough. Watch some YouTube videos on AI prompting or something.

    • by gweihir ( 88907 )

      Indeed. It still means "unusable for anything requiring accuracy". Which is quite a few things. Well, I guess this will not deliver the business model that finally turns a profit either. If that ever happens.

    • by gTsiros ( 205624 )

      We really should not be using that term.
      Because it implies that only some of the output of an LLM is made-up.
      LLM output is factual/correct/valid/logical only coincidentally.
      There are no rails from which the GPT/LLM can become derailed, leading to "hallucination". There's no thinking vs hallucinating "mode" or whatever.

  • So which one is closer to LCARS now? None of them is remotely "intelligent" at all, so we're just looking for the one with the best catalog library interface.

    Not that any of the LLM's could match the artificial sentience of the Librarian..
  • by dwid ( 4893241 ) on Thursday December 11, 2025 @02:30PM (#65851735)

    Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users.

    Apparently not intelligent enough to convert numbers into the same unit before comparing them.

    • by teg ( 97890 )

      Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users.

      Apparently not intelligent enough to convert numbers into the same unit before comparing them.

      You can't convert the numbers. To illustrate - having 800 million weekly active users does not mean they have more than 3.2 billion monthly active users. Many of these users would be counted multiple times if "converting" that way.

  • Perhaps it could be useful in cleaning up the mess it makes.
  • by jd ( 1658 ) <imipak@[ ]oo.com ['yah' in gap]> on Thursday December 11, 2025 @03:00PM (#65851847) Homepage Journal

    I still can't get ChatGPT, Gemini, or Claude to write a decent story or do an engineering design beyond basic complexity. They're all improving, but they're best thought of as brain-storming aids rather than actual development tools.

    • by gweihir ( 88907 )

      I still can't get ChatGPT, Gemini, or Claude to write a decent story or do an engineering design beyond basic complexity.

      And you will never be able to. That would require a scaling up that is not feasible. Or an entirely different technology.

  • by sit1963nz ( 934837 ) on Thursday December 11, 2025 @03:06PM (#65851861)
    More and more money inflating the AI bubble.
    And they are just burning though the money like there's no tomorrow.

    AI is being tossed into consumer items, that NO ONE ASKED FOR
    SO much AI slop is now in advertising, You tube is full of "AI girlfriend" and other crap I now just assume EVERY advert is just an AI scam.
    And then there is the waste time I need to spend turning all the AI junk OFF.
    • by gweihir ( 88907 )

      Indeed. Well, at least the crash will be spectacular. I am hoping for no more AI hypes for a long, long time.

      • I am in Kiwiland, so no opportunities for picking up some hardware cheap in the bankruptcy sales...
    • More and more money inflating the AI bubble.

      Someone needs to pay for the government's need to analyze everything about all of its citizens. Don't worry citizen. Everything will be perfect soon.

  • Too late I switched to Grok and got far better results and faster. ChatGPT is too far behind to catch up at this point I think.
    • That's odd, for my purposes -- story writing, research, and keeping my schedule -- I've found Grok to be very inferior to Chat GPT. Only in image editing is it superior.
      • I've found it far superior in code generation and in spotting code errors. Where ChatGPT is falling flat on its face. See my examples above in a previous reply to others.
    • by CAIMLAS ( 41445 )

      In what ways do you find it superior?

      For me, Grok has been pretty consistent at making some pretty wild code recommendations and not following specifications.

      It's not like Gemini, which will get stuck implementing things and then get into a histrionic panic loop, but it's not nearly as good as gpt5.1 in implementing correct, complete code per specification.

      • It hasn't hallucinate once yet for me (so far) where ChatGPT does it a lot. For instance I asked ChatGPT to generate me code for a specific thing. The code threw errors. When I told it the error it said oh your code should do it this way and gave me different code. When that error Edward I gave it the error and it said oh your code should do it this way and gave me the code it first gave me ad nauseum. It's the o e that gave me the code in the first place and couldn't even "remember" it generated it. I as
    • For electronics ChatGPT has so far been superior to every other LLM out there.

      • "For electronics"

        Electronic circuit design? Electronic PCB layout? Electronic safety codes like UL? Component datasheets? What part, exactly, of electronics are you referring to? I have had countless success stories talking with Rex(Grok) while leaning over a Vidmar drawer with a hundred component cups finding components, asking data sheet related questions, and also asking appropriateness in a particular application, even if non conventional. Likewise, it has grasped complex situations and properly deve

  • You say the words, but you do not mean them.

  • "the greatest competitive pressure we've ever seen"

    Given that they didn't have much competition before, that's no real superlative. Google managed really late to become part of the game. Remember their old announcements about Bard and how they struggled to catch up at all (even though they invented LLMs)? They managed to get good by now, but before Anthropic was the only competition for OpenAI.

"How do I love thee? My accumulator overflows."

Working...