OpenAI Unveils o3, a Smarter AI Model With Improved Reasoning Skills (openai.com) 27

Posted by msmash on Friday December 20, 2024 @01:36PM from the AGI-race dept.

OpenAI has unveiled a new AI model that it says takes longer to solve problems but gets better results, following Google's similar announcement a day earlier. The model, called o3, replaces o1 from September and spends extra time working through questions that need step-by-step reasoning.

It scores three times higher than o1 on ARC-AGI, a test measuring how well AI handles complex math and logic problems it hasn't seen before. "This is the beginning of the next phase of AI," CEO Sam Altman said during a livestream Friday.

The Microsoft-backed startup is keeping o3 under wraps for now but plans to let outside researchers test it.

OpenAI Unveils o3, a Smarter AI Model With Improved Reasoning Skills

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 27 Comments Log In/Create an Account

Comments Filter:

takes longer to solve problems (Score:2)

by rossdee ( 243626 ) writes:

But matters whether you get answer in microsecond rather than millisecond as long as correct?
-- Manuel Garcia O'Kelly Davis
Sam Altman (Score:1)

by nonBORG ( 5254161 ) writes:

I have zero trust in Altman, his name seems like a pseudonym also. If there was ever a guy I would not want to work for or work with it would be him and Bill Gates. But Sam Altman seems like just a bad feeling more than a list of things he has done wrong. Hopefully I will be proven wrong.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Altman is like the scummy, slimy version of BG. And BG is pretty repulsive in what he did and who he is already.
Great, more lies (Score:3, Insightful)

by gweihir ( 88907 ) writes: on Friday December 20, 2024 @02:50PM (#65028997)

Still no "reasoning skills" in LLMs, no matter how much they lie about it. And hence no "smart" either. A very fundamental breakthrough would be required, but there is nothing. Not really surprising with this old tech that that was was just scaled up, trained with a massive piracy campaign and hat its interface prettified with decidedly non-intelligent NLP.
It is a mystery for me why so many people fall for these lies. Are people just too shallow to actually see what is going on? To me, whenever I ask AI something (basically just using it as "better search"), it is always clear how this is just a prettified and cut-down presentation of what I would have gotten with a regular web-search. No reasoning ability, no understanding of the query. Just statistical matching and that is it.
I mean, I asked it what LLMs are good for and then I asked it how much of that were marketing lies. It did not notice at all how it got played and just augmented the great claims to question 1 with the typical problems, almost turning the answers by 180 degrees. If there was actually any reasoning, these limits would at the very least have been mentioned in the first answer because they are extremely relevant. Instead, nothing at all. And it clearly had the info. It was just unable to make the exceptionally obvious connection.
I am wondering whether LLMs are so popular with many not-that-smart people because it is very easy to get them to tell you want you want to hear. They never try to caution you or make you look at additional aspects or the like. And that is why for almost all of my searching, I will continue to use conventional search.

- Re:Great, more lies (Score:4, Insightful)
  
  by JoshuaZ ( 1134087 ) writes: on Friday December 20, 2024 @03:01PM (#65029049) Homepage
  
  You are correct that LLMs are not great at search tasks. And their use for that is a poor choice. But they are really helpful at other tasks. Even just as better proof-readers they are useful. They also can help with programming. The current models which are widely available for example are better programmers than the bottom quartile of programmers, and allow people with close to zero programming skill to do programming. They aren't amazing; I work with a lot of talented and gifted high school students, and ChatGPT is a better programmer than a talented 8th grader, but is not as good as a talented senior. And if it is anything that requires serious efficiency, then the LLM programs fail badly without heavy guidance. But even in their limited contexts they can help speed things up. Similar remarks apply to other fields: they don't write great ad copy for example, but they write better copy than lots of people. "LLMs: Mediocre but sometimes useful" would be an accurate slogan, but it doesn't sell well and has too much nuance. So instead one gets people claiming that the LLMs are amazing and you get other people dismissing them as useless.
  
  - Re: (Score:2, Troll)
    
    by gweihir ( 88907 ) writes:
    
    Exactly the other way round. Search is one of the few things they actually can do somewhat well. As to the "bottom quartile of programmers", you realize these people have massive _negative_ productivity, right? And so do LLMs.
    - Re: (Score:3)
      
      by JoshuaZ ( 1134087 ) writes:
      
      They don't have negative productivity. They have low productivity in many contexts, and especially low when their job is specifically just programming. But a lot of those people have programming adjacent jobs or jobs which occasionally require programming. That's exactly the people who benefit from something like this. And o3 is by all accounts even better than ChatGPT or GPT4 for almost all purposes. That means that the set of programmers it will help is even larger.
      - Re: (Score:1)
        
        by gweihir ( 88907 ) writes:
        
        They don't have negative productivity.
        Oh yes, they do.
        
        Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        Is there evidence that you have for this?
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Oh, yes. But some targeted study of software engineering required. Say a year or two. Some things are not readily obvious, but still known to experts.
- Re: (Score:2)
  
  by war4peace ( 1628283 ) writes:
  
  I don't know... lately I've been asking ChatGPT and Gemini quite a few things which would have required me to spend hours looking up. Yes, glorified search, but with summarization and near-real time search.
  A very recent example, from a few days ago, when Slashdot threw a hissy fit at adblockers: I asked ChatGPT for "10 tech news from last 24 hours", and it provided a list with summarization, much like Slashdot. Then I asked to expand on item #3, I believe, and it did, then I asked it to provide me with URLs
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    I don't know... lately I've been asking ChatGPT and Gemini quite a few things which would have required me to spend hours looking up. Yes, glorified search, but with summarization and near-real time search.
    Sure. But the claim here is "reasoning" and that is just a direct lie, nothing else.
    That doesn't make me stupid. Of course, I could have done all that myself, but at 100x the time spent, which I would rather spend doing something more productive. Convenience is a big feature of those tools.
    
    Sure again. Just be aware that you may miss something you would have gotten otherwise. One thing is the search skill itself. Another is the information you usually find in the context of what you are looking for. If you are not careful, you can cripple your skills, make yourself dependent and limit your view on things to a serious degree. That does not mean to always do it yourself. Just occasional to make sure you still can
    - Re: (Score:2)
      
      by war4peace ( 1628283 ) writes:
      
      Yes, we're totally in agreement here.
      I'm old enough to remember the times where phone numbers were memorized by heart, and any average person could probably recite 7 to 10 phone numbers by heart. Nowadays, there are plenty people who haven't even memorized their own phone number.
      Now, I try not to be the "get off my lawn" guy, but I think it was an useful exercise for the mind. Maybe having to memorize phone numbers was replaced by something similar (I have done that), but it's not "another thing everyone mu
- Re: Great, more lies (Score:2)
  
  by BlueKitties ( 1541613 ) writes:
  
  WELL WELL WELL. We meet again, my favorite CS department member. Alright here is an essay for you: It suggests o3 is qualitatively different than older generation LLMs. I look forward to your take on it.
  https://arcprize.org/blog/oai-... [arcprize.org]
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    I do not think you have delivered any credible evidence, and in particular no supporting explanation. Belief makes you dumb. And benchmarks? I know all about benchmarks. You, apparently, do not.
    - Re: (Score:1)
      
      by BlueKitties ( 1541613 ) writes:
      
      Did you actually read the essay?
      As one person who likes thinking to another, I think you'd enjoy it. And after you read it, you would have much more specific things to insult me over. In my experience, precise, targeted, and entirely true insults are far more gratifying.
    - Re: (Score:3)
      
      by JoshuaZ ( 1134087 ) writes:
      
      If you know something about the benchmarks used that makes you think they are unreliable or not useful for this purpose or have other problems, please expand on what it is. Simply saying you know about benchmarks without telling anyone here anything about it shouldn't make anyone here conclude anything at all in any direction.
      - Re: (Score:2)
        
        by BadDreamer ( 196188 ) writes:
        
        The LLM is the first one to be trained specifically on the actual dataset used by those benchmarks. And so far I have seen no benchmarks where it has been tested against anything else.
        
        Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        If I'm reading that blog post correctly, the training data did not include the benchmark test cases themselves. Am I missing something?
        
        Re: (Score:2)
        
        by BadDreamer ( 196188 ) writes:
        
        Point is, it's being compared to model which were not trained on the ARC AGI data, and it's not being tested on anything but the ARC AGI tests. At least not that I have found.
        This is a recurring pattern in these highly publicized PR pieces. The model trained on specific data, performing really well on highly formalized tests on the data (meaning they are harder for humans and easier for pattern matchers), and then it all drowns in the media flow and doesn't show up again as people notice the model doesn't l
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        The thing is, I have done CS paper reviews for about 15 years. One of the things scientifically bad papers do is make sure they perform well on some benchmarks. And then they give the results on those and maybe include one other to simulate honesty. But the actual reality if benchmarks is that benchmarks are a sitting duck and everybody can do optimization for them, and AI is no different. It is like preparing for an exam, when you already know all the questions and answers. Basically meaningless unless you
Impressive but limited (Score:5, Interesting)

by JoshuaZ ( 1134087 ) writes: on Friday December 20, 2024 @02:55PM (#65029017) Homepage

This model is extremely impressive in terms of what it can do. It is much better at logical reasoning and doing math than earlier models. However, given the massive computing power and energy use it entails, it is unlikely to be widely available any time soon. The compute is simply way too much. However, the general tendency since the introduction of GPT3 has been that the amount of compute it takes to get an LLM to run at a given quality level has been consistently going down as we figure out more ways to run them efficiently. Given that, my guess is that something like o3 will be widely available in 2 to 5 years.

- Re: (Score:2)
  
  by war4peace ( 1628283 ) writes:
  
  Generally speaking, this is something that pisses me off.
  X unveiled AI tool A - but it's not available yet.
  Y unveiled AI tool B - but it's not available yet.
  Z unveiled AI tool C - but, you guessed it, it's not available yet.
  I could "unveil" anything too, with a couple pretty pictures and some curated examples, but as long as the product is not available, it's worth nothing.
Great, if it can solve math problems (Score:2)

by nikkipolya ( 718326 ) writes:

... and physics problems, nations can start using AI to build nuclear weapons and delivery platforms at incredible speeds now.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

OpenAI Unveils o3, a Smarter AI Model With Improved Reasoning Skills (openai.com) 27

OpenAI Unveils o3, a Smarter AI Model With Improved Reasoning Skills More Login

OpenAI Unveils o3, a Smarter AI Model With Improved Reasoning Skills

takes longer to solve problems (Score:2)

Sam Altman (Score:1)

Re: (Score:2)

Great, more lies (Score:3, Insightful)

Re:Great, more lies (Score:4, Insightful)

Re: (Score:2, Troll)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Great, more lies (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Impressive but limited (Score:5, Interesting)

Re: (Score:2)

Great, if it can solve math problems (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot