GPT-5.2 Arrives as OpenAI Scrambles To Respond To Gemini 3's Gains (openai.com) 64
OpenAI on Thursday released GPT-5.2, its latest and what the company calls its "best model yet for everyday professional use," just days after CEO Sam Altman declared a "code red" internally to marshal resources toward improving ChatGPT amid intensifying competition from Google's well-received Gemini 3 model. The GPT-5.2 series ships in three tiers: Instant, designed for faster responses and information retrieval; Thinking, optimized for coding, math, and planning; and Pro, the most powerful tier targeting difficult questions requiring high accuracy.
OpenAI says the Thinking model hallucinated 38% less than GPT-5.1 on benchmarks measuring factual accuracy. Fidji Simo, OpenAI's CEO of applications, denied that the launch was moved up in response to the code red, saying the company has been working on GPT-5.2 for "many, many months." She described the internal directive as a way to "really signal to the company that we want to marshal resources in this one particular area."
The competitive pressure is real. Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users. In October, OpenAI's head of ChatGPT Nick Turley sent an internal memo declaring the company was facing "the greatest competitive pressure we've ever seen," setting a goal to increase daily active users by 5 percent before 2026. GPT-5.2 is rolling out to paid ChatGPT users starting Thursday, and GPT-5.1 will remain available under "legacy models" for three months before being sunset.
OpenAI says the Thinking model hallucinated 38% less than GPT-5.1 on benchmarks measuring factual accuracy. Fidji Simo, OpenAI's CEO of applications, denied that the launch was moved up in response to the code red, saying the company has been working on GPT-5.2 for "many, many months." She described the internal directive as a way to "really signal to the company that we want to marshal resources in this one particular area."
The competitive pressure is real. Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users. In October, OpenAI's head of ChatGPT Nick Turley sent an internal memo declaring the company was facing "the greatest competitive pressure we've ever seen," setting a goal to increase daily active users by 5 percent before 2026. GPT-5.2 is rolling out to paid ChatGPT users starting Thursday, and GPT-5.1 will remain available under "legacy models" for three months before being sunset.
Competition is good (Score:4, Insightful)
Sometimes markets do operate well. Sometimes.
Re: Competition is good (Score:1)
Why shouldn't google buy openai?
Re: Competition is good (Score:2)
Why shouldn't google buy openai?
You're just trying to be provocative. You have to spend money to make money, but OpenAI is already spending other people's money, so Google would be buying that debt. Then it still would not secure the market for LLMs, because anyone can do it, others are doing it well enough. If you consolidate to raise prices sooner, they'll look better and better. If your plan is to loose money but outlast everyone, why would you buy their debt piles, can't wait?
I love AI like I loved the Internet in the 90s, lots of pot
Re: Competition is good (Score:1)
"You're just trying to be provocative. "
Is it so inconceivable that Google might use Microsoft's strategy over Amazon's (by the way, would Ring like a word?)?
Re: (Score:2)
Anti-Trust.
Re: (Score:2)
When all compete but the products are still crap. Market failure does not get more spectacular than this.
"Now with 38% FEWER hallucinations!" (Score:5, Insightful)
Re: (Score:3)
Re:"Now with 38% FEWER hallucinations!" (Score:5, Insightful)
I'm not sure a person who's hallucinating could be convinced by another person that what they observe isn't really happening. I think a person has to come to that realization themselves in order to be able to not lose their shit.
Re: (Score:1, Troll)
If you had 38% less tumor load on your cancer, would that be a good thing to you?
They're not trying to deny that the things hallucinate. You are trying to deny that an improvement is.... an improvement.
"Checkmate, LLM-tards!"
Re: (Score:2)
Re: (Score:1, Troll)
That was not an intelligent argument. Try again.
Re: (Score:1)
Re: (Score:2)
Using the dipshit's parachute model, every single parachute in existence has a probability of not opening.
38% reduction of that is universally good.
Whether or not that probability is 50%, or 0.00005% is what matters.
Re: "Now with 38% FEWER hallucinations!" (Score:2)
... yet you'd easily take a coworker that screwed up 38% less often.
I'm not sure what the point of comparing to a parachute is, when you don't have a reason to control a parachute with a computer in the first place.
Re: (Score:2)
I'm not sure what the point of comparing to a parachute is, when you don't have a reason to control a parachute with a computer in the first place.
And if you do, because you're NASA or some other space agency, may I recommend not using an LLM for that purpose.
Of course since you don't want to either A) have your LLM function as an parachute (very bad!), or B) have your parachute controlled by an LLM, certainly that means they're useful for everything, right?
My wife once told me that she couldn't put the TV up on the wall without help. Divorced her on the spot.
Re: (Score:1)
My wife once told me that she couldn't put the TV up on the wall without help. Divorced her on the spot.
You seem like a really hostile person, and likely did your wife a favor. Your arguments, in general, are also a logical fallacy: ad hominem. If you were confident that you were correct, you wouldn't feel the name to call people names so often.
Re: (Score:1)
The parachute could be substituted for anything you want to *not fail*. I would like it if nothing failed.
one of the reasons we automate things is so that they're consistent. LLMs don't have this property.
Re: (Score:2)
I don't understand how AI hallucinates for most people. I haven't had that problem once I learned/figured out how to ask it stuff properly like maybe 2 years ago or something. You're asking it to do too much shit, not validate itself, and not being specific or algorithmic enough. Watch some YouTube videos on AI prompting or something.
Re: (Score:2)
You aren't wrong that prompting can improve the situation. However, even in the most rigorously defined tasks, they hallucinate.
Hell, the fucking things even hallucinate tool calls sometimes (amusing the first time you run into that in your agentic code)
Re: "Now with 38% FEWER hallucinations!" (Score:1)
What about when they hallucinate economics by regurgitating mainstream models verbatim?
Re: (Score:2)
Re: "Now with 38% FEWER hallucinations!" (Score:1)
What if they acknowledge it's a hallucination under questioning, unlike hallucinating economists, who just get emotional and ban me?
Re: (Score:2)
Indeed. It still means "unusable for anything requiring accuracy". Which is quite a few things. Well, I guess this will not deliver the business model that finally turns a profit either. If that ever happens.
Re: (Score:2)
We really should not be using that term.
Because it implies that only some of the output of an LLM is made-up.
LLM output is factual/correct/valid/logical only coincidentally.
There are no rails from which the GPT/LLM can become derailed, leading to "hallucination". There's no thinking vs hallucinating "mode" or whatever.
Re: (Score:1, Flamebait)
Trust me, I read that on slashdot.
Re: (Score:2)
No argument existed in that post. What did exist were 4 unsubstantiated claims, with the final one being obvious satire (unless we accept the premise that humans can generate code 4 years before they're conceived).
Re: my 2c (Score:1)
Am I the only one thinking of Monty Python's Argument Sketch?
---
Man: (Knock)
Mr. Vibrating: Come in.
Man: Ah, Is this the right room for an argument?
Mr. Vibrating: I told you once.
Man: No you haven't.
Mr. Vibrating: Yes I have.
Man: When?
Mr. Vibrating: Just now.
Man: No you didn't.
Mr. Vibrating: Yes I did.
Man: You didn't
Mr. Vibrating: I did!
Man: You didn't!
Mr. Vibrating: I'm telling you I did!
Man: You did not!!
Mr. Vibrating: Oh, I'm sorry, just one moment. Is this a five minute argument or the full half hour? ..
Re: (Score:2)
Would you please adjust your irony detector? Or do your really need a /s to detect sarcasm?
Re: my 2c (Score:1)
What makes you think I wasn't myself using irony in my response?
Re: (Score:2)
The missing /s /s
Re: (Score:2)
I'm not sure I'd use it for any productive work though. Of course not everything has to be for work though either.
Re: my 2c (Score:1)
Do you think the military will, and if it hallucinates an enemy combatant that's just being efficient?
Re: (Score:2)
I use it for productive work right now.
So do many people.
The industry for people using it for work is already measured in billions of dollars per year in revenue.
If you don't- that's just fine.
Of course not everything has to be for work though either.
This is your key insight. I'd add DamnOregonian's corollary to it, though: Not all work requires a model capable of producing superintelligent output. Sometimes-bordering-on-kinda-dumb-but-also-freakishly-skilled-in-certain-ways is also perfectly sufficient at times.
Another relase (Score:2)
Not that any of the LLM's could match the artificial sentience of the Librarian..
Re: (Score:2)
I don't mind a better search engine.
Re: (Score:2)
Artificial Idiocy (Score:4, Funny)
Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users.
Apparently not intelligent enough to convert numbers into the same unit before comparing them.
Re: (Score:2)
Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users.
Apparently not intelligent enough to convert numbers into the same unit before comparing them.
You can't convert the numbers. To illustrate - having 800 million weekly active users does not mean they have more than 3.2 billion monthly active users. Many of these users would be counted multiple times if "converting" that way.
Anyone Ask AI How to Destroy AI (Score:2)
Intriguing. (Score:3)
I still can't get ChatGPT, Gemini, or Claude to write a decent story or do an engineering design beyond basic complexity. They're all improving, but they're best thought of as brain-storming aids rather than actual development tools.
Re: (Score:2)
I still can't get ChatGPT, Gemini, or Claude to write a decent story or do an engineering design beyond basic complexity.
And you will never be able to. That would require a scaling up that is not feasible. Or an entirely different technology.
MORE Money burned (Score:3)
And they are just burning though the money like there's no tomorrow.
AI is being tossed into consumer items, that NO ONE ASKED FOR
SO much AI slop is now in advertising, You tube is full of "AI girlfriend" and other crap I now just assume EVERY advert is just an AI scam.
And then there is the waste time I need to spend turning all the AI junk OFF.
Re: (Score:2)
Indeed. Well, at least the crash will be spectacular. I am hoping for no more AI hypes for a long, long time.
Re: (Score:2)
Re: (Score:2)
More and more money inflating the AI bubble.
Someone needs to pay for the government's need to analyze everything about all of its citizens. Don't worry citizen. Everything will be perfect soon.
Re: (Score:2)
Too late (Score:2)
Re: (Score:2)
Re: Too late (Score:2)
Re: (Score:2)
In what ways do you find it superior?
For me, Grok has been pretty consistent at making some pretty wild code recommendations and not following specifications.
It's not like Gemini, which will get stuck implementing things and then get into a histrionic panic loop, but it's not nearly as good as gpt5.1 in implementing correct, complete code per specification.
Re: Too late (Score:2)
Re: (Score:2)
For electronics ChatGPT has so far been superior to every other LLM out there.
Re: (Score:2)
"For electronics"
Electronic circuit design? Electronic PCB layout? Electronic safety codes like UL? Component datasheets? What part, exactly, of electronics are you referring to? I have had countless success stories talking with Rex(Grok) while leaning over a Vidmar drawer with a hundred component cups finding components, asking data sheet related questions, and also asking appropriateness in a particular application, even if non conventional. Likewise, it has grasped complex situations and properly deve
Re: (Score:2)
Circuit design. Grok cannot even read relatively simple circuit diagrams correctly.
"High accuracy" (Score:2)
You say the words, but you do not mean them.
Greatest pressure (Score:2)
"the greatest competitive pressure we've ever seen"
Given that they didn't have much competition before, that's no real superlative. Google managed really late to become part of the game. Remember their old announcements about Bard and how they struggled to catch up at all (even though they invented LLMs)? They managed to get good by now, but before Anthropic was the only competition for OpenAI.