OpenAI Unveils o3, a Smarter AI Model With Improved Reasoning Skills (openai.com) 27
OpenAI has unveiled a new AI model that it says takes longer to solve problems but gets better results, following Google's similar announcement a day earlier. The model, called o3, replaces o1 from September and spends extra time working through questions that need step-by-step reasoning.
It scores three times higher than o1 on ARC-AGI, a test measuring how well AI handles complex math and logic problems it hasn't seen before. "This is the beginning of the next phase of AI," CEO Sam Altman said during a livestream Friday.
The Microsoft-backed startup is keeping o3 under wraps for now but plans to let outside researchers test it.
It scores three times higher than o1 on ARC-AGI, a test measuring how well AI handles complex math and logic problems it hasn't seen before. "This is the beginning of the next phase of AI," CEO Sam Altman said during a livestream Friday.
The Microsoft-backed startup is keeping o3 under wraps for now but plans to let outside researchers test it.
takes longer to solve problems (Score:2)
But matters whether you get answer in microsecond rather than millisecond as long as correct?
-- Manuel Garcia O'Kelly Davis
Sam Altman (Score:1)
Re: (Score:2)
Altman is like the scummy, slimy version of BG. And BG is pretty repulsive in what he did and who he is already.
Great, more lies (Score:3, Insightful)
Still no "reasoning skills" in LLMs, no matter how much they lie about it. And hence no "smart" either. A very fundamental breakthrough would be required, but there is nothing. Not really surprising with this old tech that that was was just scaled up, trained with a massive piracy campaign and hat its interface prettified with decidedly non-intelligent NLP.
It is a mystery for me why so many people fall for these lies. Are people just too shallow to actually see what is going on? To me, whenever I ask AI something (basically just using it as "better search"), it is always clear how this is just a prettified and cut-down presentation of what I would have gotten with a regular web-search. No reasoning ability, no understanding of the query. Just statistical matching and that is it.
I mean, I asked it what LLMs are good for and then I asked it how much of that were marketing lies. It did not notice at all how it got played and just augmented the great claims to question 1 with the typical problems, almost turning the answers by 180 degrees. If there was actually any reasoning, these limits would at the very least have been mentioned in the first answer because they are extremely relevant. Instead, nothing at all. And it clearly had the info. It was just unable to make the exceptionally obvious connection.
I am wondering whether LLMs are so popular with many not-that-smart people because it is very easy to get them to tell you want you want to hear. They never try to caution you or make you look at additional aspects or the like. And that is why for almost all of my searching, I will continue to use conventional search.
Re:Great, more lies (Score:4, Insightful)
Re: (Score:2, Troll)
Exactly the other way round. Search is one of the few things they actually can do somewhat well. As to the "bottom quartile of programmers", you realize these people have massive _negative_ productivity, right? And so do LLMs.
Re: (Score:3)
Re: (Score:1)
They don't have negative productivity.
Oh yes, they do.
Re: (Score:2)
Re: (Score:2)
Oh, yes. But some targeted study of software engineering required. Say a year or two. Some things are not readily obvious, but still known to experts.
Re: (Score:2)
I don't know... lately I've been asking ChatGPT and Gemini quite a few things which would have required me to spend hours looking up. Yes, glorified search, but with summarization and near-real time search.
A very recent example, from a few days ago, when Slashdot threw a hissy fit at adblockers: I asked ChatGPT for "10 tech news from last 24 hours", and it provided a list with summarization, much like Slashdot. Then I asked to expand on item #3, I believe, and it did, then I asked it to provide me with URLs
Re: (Score:2)
I don't know... lately I've been asking ChatGPT and Gemini quite a few things which would have required me to spend hours looking up. Yes, glorified search, but with summarization and near-real time search.
Sure. But the claim here is "reasoning" and that is just a direct lie, nothing else.
That doesn't make me stupid. Of course, I could have done all that myself, but at 100x the time spent, which I would rather spend doing something more productive. Convenience is a big feature of those tools.
Sure again. Just be aware that you may miss something you would have gotten otherwise. One thing is the search skill itself. Another is the information you usually find in the context of what you are looking for. If you are not careful, you can cripple your skills, make yourself dependent and limit your view on things to a serious degree. That does not mean to always do it yourself. Just occasional to make sure you still can
Re: (Score:2)
Yes, we're totally in agreement here.
I'm old enough to remember the times where phone numbers were memorized by heart, and any average person could probably recite 7 to 10 phone numbers by heart. Nowadays, there are plenty people who haven't even memorized their own phone number.
Now, I try not to be the "get off my lawn" guy, but I think it was an useful exercise for the mind. Maybe having to memorize phone numbers was replaced by something similar (I have done that), but it's not "another thing everyone mu
Re: Great, more lies (Score:2)
WELL WELL WELL. We meet again, my favorite CS department member. Alright here is an essay for you: It suggests o3 is qualitatively different than older generation LLMs. I look forward to your take on it.
https://arcprize.org/blog/oai-... [arcprize.org]
Re: (Score:2)
I do not think you have delivered any credible evidence, and in particular no supporting explanation. Belief makes you dumb. And benchmarks? I know all about benchmarks. You, apparently, do not.
Re: (Score:1)
Did you actually read the essay?
As one person who likes thinking to another, I think you'd enjoy it. And after you read it, you would have much more specific things to insult me over. In my experience, precise, targeted, and entirely true insults are far more gratifying.
Re: (Score:3)
Re: (Score:2)
The LLM is the first one to be trained specifically on the actual dataset used by those benchmarks. And so far I have seen no benchmarks where it has been tested against anything else.
Re: (Score:2)
Re: (Score:2)
Point is, it's being compared to model which were not trained on the ARC AGI data, and it's not being tested on anything but the ARC AGI tests. At least not that I have found.
This is a recurring pattern in these highly publicized PR pieces. The model trained on specific data, performing really well on highly formalized tests on the data (meaning they are harder for humans and easier for pattern matchers), and then it all drowns in the media flow and doesn't show up again as people notice the model doesn't l
Re: (Score:2)
The thing is, I have done CS paper reviews for about 15 years. One of the things scientifically bad papers do is make sure they perform well on some benchmarks. And then they give the results on those and maybe include one other to simulate honesty. But the actual reality if benchmarks is that benchmarks are a sitting duck and everybody can do optimization for them, and AI is no different. It is like preparing for an exam, when you already know all the questions and answers. Basically meaningless unless you
Impressive but limited (Score:5, Interesting)
Re: (Score:2)
Generally speaking, this is something that pisses me off.
X unveiled AI tool A - but it's not available yet.
Y unveiled AI tool B - but it's not available yet.
Z unveiled AI tool C - but, you guessed it, it's not available yet.
I could "unveil" anything too, with a couple pretty pictures and some curated examples, but as long as the product is not available, it's worth nothing.
Great, if it can solve math problems (Score:2)
... and physics problems, nations can start using AI to build nuclear weapons and delivery platforms at incredible speeds now.