Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI Technology

OpenAI Releases o1, Its First Model With 'Reasoning' Abilities 98

OpenAI has launched a new AI model, named "o1", designed for improved reasoning and problem-solving skills. o1, part of a new series of models and available in ChatGPT and the API, can tackle complex tasks in science, coding, and math more effectively than their predecessors. Notably, o1 models have shown promising results in standardized tests and coding competitions. While o1 models represent a significant advancement in AI capabilities, they currently lack features like web browsing and file uploading. The Verge adds: But it's also more expensive and slower to use than GPT-4o. OpenAI is calling this release of o1 a "preview" to emphasize how nascent it is.

ChatGPT Plus and Team users get access to both o1-preview and o1-mini starting today, while Enterprise and Edu users will get access early next week. OpenAI says it plans to bring o1-mini access to all the free users of ChatGPT but hasn't set a release date yet. Developer access to o1 is really expensive: In the API, o1-preview is $15 per 1 million input tokens, or chunks of text parsed by the model, and $60 per 1 million output tokens. For comparison, GPT-4o costs $5 per 1 million input tokens and $15 per 1 million output tokens.

The training behind o1 is fundamentally different from its predecessors, OpenAI's research lead, Jerry Tworek, tells me, though the company is being vague about the exact details. He says o1 "has been trained using a completely new optimization algorithm and a new training dataset specifically tailored for it."

OpenAI Releases o1, Its First Model With 'Reasoning' Abilities

Comments Filter:
  • 'Reasoning' Abilities? But is it AGI?

    • by taustin ( 171655 )

      Will it still give instructions on how to glue the cheese onto your pizza?

    • No, of course not.

      An LLM is good at recognizing patterns and adapting them to a given context.

      This "reasoning" is no doubt an extension of this concept: recognize patterns that "look like" reasoning and adapt them to a given context. Just because it mimics reasoning, doesn't make it...reasoning.

    • by HiThere ( 15173 )

      No. Reasonign abilities is PART of an AGI. Just like language processing is part. It allows expansion into additional use cases, but it sure can't handle everything.

      Also, Re: Reasoning is slower. This shouldn't be a surprise to anyone. If you can remember the answer it's always faster than figuring it out.

    • by gweihir ( 88907 )

      "Reasoning" abilities are the reasoning abilities without ability to reason. And no, it is not AGI, unless you use their definition which does not require General Intelligence for something to be AGI. Or actual intelligence for something to be "intelligent".

      The whole thing is a big, fat lie by misdirection with the aim of making some people filthy rich, Oh, and look, that aspect works.

      • by narcc ( 412956 )

        "Reasoning" abilities are the reasoning abilities without ability to reason.

        That about sums it up. There isn't anything new here. This is just CoT by default hidden from the user. It has the same fundamental limitations it always had.

        I'll give them this, it takes a lot of chutzpah to put this out there like it's some great leap forward.

    • It can enumerate, in ascending order, all the irrational numbers between 31 and 37.

    • I suppose if they thought it were AGI they would have remembered to mention that in the press release.
    • OpenAI: "Here's a new incremental turd. Btw, you guys forgot about that whole 'doing audio in multi-modal models' thing, right? Yeah, we really didn't want to ever give you access to that, that was just to gain investor money."

  • How long do we have to wait before the Butlerian Jihad? Can it be done preemptively?
    • This idea gives the LLM hype way, way too much credibility. If you work with LLMs every day like I do, you will not be afraid.

      Try something simple, like ask Copilot to make a logo for a company named Anderson Electrical and see if it can spell the name you just gave it correctly, even once.

      No, this "reasoning" isn't going to compare with actual human reasoning. It will just mimic it.

      • Try something simple, like ask Copilot to make a logo for a company named Anderson Electrical and see if it can spell the name you just gave it correctly, even once.

        This is conflating LLMs with DALL-E diffusion models which don't even pretend to have any high level grasp. It is a bit like an artist drawing a PCB or a rocket motor. They may draw something that superficially looks like one but they don't have the necessary understanding or experience of what the components they are drawing actually do to know whether the arrangement of components in their rendition is even coherent.

        What the image models do understand albeit never quite perfectly are lower level ..illum

        • I realize that DALL-E is different from LLMs. And that's part of the point. This "reasoning" might look like human reasoning within a strictly bounded domain. But real reasoning has to cross domains.

          And you still give LLMs too much credit. If you use GitHub Copilot to generate code for you, it's still going to trip all over itself frequently, even if you're asking it to do something simple. I'd say it gets it right (as in, can compile and does what you asked) maybe 10% of the time, and that's being generous

          • I realize that DALL-E is different from LLMs. And that's part of the point. This "reasoning" might look like human reasoning within a strictly bounded domain. But real reasoning has to cross domains.

            The irony of this bit of bad reasoning being used when called out on giving a terrible example is hilarious.

            Also, open your mind. I bet that if you showed the best of what LLMs and stable diffusion models (including video generation) can do today to yourself just 3 years ago, you would have thought all of it was fake (as in not created by AI). You would be gobsmacked to find out that it was really generated by AI.

            • Can you elaborate on what you think makes my example "terrible"?

              Yes, I'm absolutely amazed at what AI can do. But that's not the same thing as being worried that AI could replace human reasoning. It's possible for AI to be amazing, and still not confusable with human reasoning, at the same time.

              • Can you elaborate on what you think makes my example "terrible"?

                Condensed: "LLMs can't reason. Just try generating an image with [NOT AN LLM], and it makes mistakes."
                And then when called out on that: "That is part of the point. [Absolutely no explanation as to why that is part of the point, but some trivially untrue nonsense about reasoning having to 'cross domains']"

                It's possible for AI to be amazing, and still not confusable with human reasoning, at the same time.

                Fair enough. Your basis for being dismissive is bad, though: "If you work with LLMs every day like I do, you will not be afraid. [...] I'd say it gets it right [...] maybe 10% of the time, and that's being

                • Ah, I see your criticism and (partially) accept it. I should have turned it around and asked the LLM a question and note that it does a bad job with math.

                  At a basic level, a diffusion model and an LLM work on very similar principles. Instead of patterns of pixels, an LLM looks at patterns of tokens. The "AI" part is based on the same principles in both cases.

                  • Asked ChatGPT whether your last paragraph was accurate. Answer:
                    "The statement is partially accurate but oversimplifies the differences between diffusion models and large language models (LLMs). Here's a more nuanced explanation:

                    Underlying Principles: Both diffusion models and LLMs are based on deep learning principles, specifically on neural networks. They learn from large datasets to generate outputs—images in the case of diffusion models and text in the case of LLMs. How

                    • Your use of AI to rebut my points both points to the power and the limitations of AI, at the same time. While the "nuances" are generally correct, they miss the core point that while the words describing the patterns differ, at their core, they rely on very similar principles. Pattern recognition is pattern recognition, whether those patterns consist of pixels or tokens.

                      What I have learned about AI is that the responses are generally useful and helpful, they also lack *understanding* and are most helpful fo

                    • Using that same logic: Human brains (biological neural networks) and AI (artificial neural networks) also rely on very similar principles. Pattern recognition is pattern recognition, whether it is done by organic matter or inorganic matter. Your logic thus states that human brains cannot reason, which is what renders your position logically absurd.

                      Your move. Please spare me any metaphysical fairy dust. It is about complex behavior arising from functionally simple components. Biological neurons aren't magic,

                    • Using that same logic: Human brains (biological neural networks) and AI (artificial neural networks) also rely on very similar principles. Pattern recognition is pattern recognition, whether it is done by organic matter or inorganic matter

                      I agree! This is literally how AI got its name, it mimics the way humans process information.

                      Your logic thus states that human brains cannot reason, which is what renders your position logically absurd.

                      Not quite. The biggest difference is, *all* of the various disparate types of intelligence in humans, are tied together. Image analysis, language processing, speech recognition, numeric analysis, philosophical analysis, emotional processing, empathy, ethical analysis, creativity, and so on, are all part of a unified whole in humans, that work together to form human intelligence. No AI is yet able to combine all thes

                    • I agree!

                      Good! Then you should understand that the fundamental difference is in the topology of the neural network(s).
                      The topology for a diffusion ANN is very different than that for an LMM. So we can agree that this was badly reasoned: "This idea gives the LLM hype way, way too much credibility. If you work with LLMs every day like I do, you will not be afraid. Try something simple, like ask Copilot to make a logo for a company named Anderson Electrical and see if it can spell the name you just gave it correctly, e

                    • Pigs and chimps are *way* closer to AGI than LLMs, by multiple orders of magnitude.

                      Since my reasoning is so flawed, maybe you should start wondering if I'm actually an AI. Hmmm...

          • I realize that DALL-E is different from LLMs.

            "Try something simple, like ask Copilot to make a logo for a company named Anderson Electrical and see if it can spell the name you just gave it correctly, even once." seems to have rather deliberately conflated two very different things.

            And that's part of the point. This "reasoning" might look like human reasoning within a strictly bounded domain. But real reasoning has to cross domains.

            It's a completely separate model/system from the LLM. They are just gluing separate systems together in a common front end to make it seem coherent. There is ongoing work on true multi-modality where models have ability to generalize across modalities yet this isn't what

            • End users don't know or care that an LLM is a different technology than a diffusion model. To them, they're just asking for an image. The fact that it can't do the right thing with the text, is a flaw, even if we technical people understand that there's a very good reason for it.

              For this "reasoning" to be convincing and to be able to "replace" humans, it will have to work across technologies. People won't forgive it for being one specific kind of model, where the thing it fails at belongs to a different typ

              • End users don't know or care that an LLM is a different technology than a diffusion model. To them, they're just asking for an image. The fact that it can't do the right thing with the text, is a flaw, even if we technical people understand that there's a very good reason for it.

                For this "reasoning" to be convincing and to be able to "replace" humans, it will have to work across technologies. People won't forgive it for being one specific kind of model, where the thing it fails at belongs to a different type of model.

                What do your opinions of end user expectations have to do with your prior statement?

                "This idea gives the LLM hype way, way too much credibility. If you work with LLMs every day like I do, you will not be afraid.

                Try something simple, like ask Copilot to make a logo for a company named Anderson Electrical and see if it can spell the name you just gave it correctly, even once.

                No, this "reasoning" isn't going to compare with actual human reasoning. It will just mimic it."

                Do you now deny that you were in fact judging LLMs on the basis of the output of a model that isn't even an LLM? If that was not what you were doing what were you doing? How should have your words have been interpreted?

                • This entire thread is about people revolting because AI is "taking over." End users aren't going to revolt over something that is unable to actually take their jobs away because it is limited to either LLM, or a math model, or diffusion, or whatever. This so-called reasoning isn't going to be able to fix all those limitations, so I think we're safe from a violent revolution for some time to come. THAT's what end user expectations have to do with this.

                  I'm SORRY I used the word LLM when I should have used the

          • . I'd say it gets it right (as in, can compile and does what you asked) maybe 10% of the time, and that's being generous.

            That's very close to what OpenAI claims GPT4o got on the Math Olympiad - but that was then:

            "In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces competitions."

          • I'd say it gets it right (as in, can compile and does what you asked) maybe 10% of the time

            What percentage of your .C files compile and run correctly the first time?

            Almost none of them? Me, too.

            So why do you hold AI to standards you don't hold yourself to?

            • The difference is, when I get it wrong the first time, I figure out how to fix it, make corrections, and I know when it's right.

              With LLMs, if it gets it wrong once, asking it to fix the problem has about a 0% chance of making it better.

              For example, I was writing some Javascript code in Visual Studio, and asked Copilot to write a fairly straightforward function to change the font styles of a block of text when a certain click event occurred. Copilot dutifully spit out a bunch of HTML tags, right in the middl

              • The difference is, when I get it wrong the first time, I figure out how to fix it, make corrections, and I know when it's right. With LLMs, if it gets it wrong once, asking it to fix the problem has about a 0% chance of making it better.

                Agreed, historically the model either delivers working code without a lot of handholding, or it is almost hopeless. o1-preview looks like a game changer in that regard, though. It absolutely can do just what you're saying.

                I benchmarked it with a relatively-obscure functio [chatgpt.com]

                • If o1 can do what you say, I would be very happy!

                  • Definitely not perfect -- if you look at the second test case, the answer it says to expect is wrong even though the function itself returns the right answer (2^1050 mod 2029 = 1940, which is what the function actually returns, not 432).

                    But it is clearly a noticeable advance over GPT4o.

      • Try something simple, like ask Copilot to make a logo for a company named Anderson Electrical and see if it can spell the name you just gave it correctly, even once.

        "Also, what's up with this screwdriver? I tried to pry open a shipping crate with it, and the tip snapped off."

      • Try something simple, like ask Copilot to make a logo for a company named Anderson Electrical and see if it can spell the name you just gave it correctly, even once.

        I just used ChatGPT for this and it got it right on the first try. Looks pretty good, too.

        To be fair, there were some extraneous symbols below and to the sides of the logo, but I (who have next to no Photoshop skills) could clean that up in about 2 minutes.

  • Nobody can make an AI that can reason to any degree worthy of the word.

    If they could, we'd have an actual intelligence and the news would be screaming it from every outlet as the most revolutionary technological advancement ever made by humanity.

    • by gweihir ( 88907 )

      Indeed. As of this time, there is not even a theoretical credible approach to creating AGI, and hence it is completely unknown whether it is even possible. In any case it is at the very least 50 years away.

      What these assholes are doing is that they redefine terminology to allow them to lie by misdirection. So "intelligent" suddenly means simple things like a vacuum cleaner that can detect whether it is full, when it used to mean "human like capability for insight". Dumb "Automation" becomes "Artificial Int

    • by znrt ( 2424692 )

      they say "reasoning abilities", not reasoning. just regular hype-talk which could mean anything, same as "completely new optimization algorithm" and "a new training dataset specifically tailored for it". so an optimization, a new product. i guess if they had a real breakthrough there would be a paper.

      then again ...

      Nobody can make an AI that can reason to any degree worthy of the word.

      ... nor can most humans all the time (those pesky emotions), just above average could already be useful. i'm half joking, but i guess what i'm saying is that (hype besides) maybe it's not a good i

    • What they mean is they've come up with some new design for a neural network component, probably. Just like transformers enabled ChatGTP, they're hoping this new thing enables the next thing. Maybe it will. We used to think it was all about parameter count, but it's becoming clear that network architecture is important as well.
      • by narcc ( 412956 )

        Oh, it's so much less than that...

        There is nothing new here. This is just CoT hidden from the user. Really.

        It's like a bad joke.

  • ...if only there was a way for humans to reason they wouldn't need this AI but alas the mere fact the AI was created suggests some humans might not be able to reason that well. This we must automate the translated limited reasoning capabilities of some humans into machine language and automated it and it's flaws via AI. Did you hear about Tesla's "autopilot" feature? It can drive by itself.
  • I want to see how this model handle the middle east crisis. Two state solution? One state solution?

  • ""has been trained using a completely new optimization algorithm"

    Ignoring the obvious error in this statement from the "research lead", didn't we just get a 100x increase in performance from new optimizations? When will these people become competent?

    • by gweihir ( 88907 )

      Same as some other people (MS, I am looking at you), these people _are_ competent. They are competent at marketing, misdirection and over-promising, that is. They are not competent researchers or engineers.

  • by gweihir ( 88907 ) on Thursday September 12, 2024 @02:13PM (#64783205)

    Same as "AI" does not mean intelligent. So, no, this thing cannot reason. And no, it is not AGI. But OpenAI is currently trying to get more money and there may be a connection to this release and these statements.

    • Same as "AI" does not mean intelligent. So, no, this thing cannot reason.

      It is not the same. Intelligence and intelligent are different words with different meanings. Reasoning and reasoning are the same words with the same sets of meanings.

      • It is not the same. Intelligence and intelligent are different words with different meanings.

        I would say it's silly for a different reason. AI is a field of study trying to understand how to create artificial intelligence. When someone says something is AI, they are often saying it is a product of that research.

        Reasoning and reasoning are the same words with the same sets of meanings.

        Reasoning is not really scientifically defined. One goal of AI research is to create a scientific definition that wil

        • by gweihir ( 88907 )

          Reasoning is not really scientifically defined. /p>

          Actually, it pretty much is. It is commonly a short form for "deductive reasoning": https://en.wikipedia.org/wiki/... [wikipedia.org]
          It is something that humans are between abysmally bad and very good at creating and, in a formalized setting, machines are universally very bad at finding (or more accurately practically incapable of finding even for relatively simple things), but very good at checking the validity of a reasoning chain. It is also exactly defined in mathematical logic.

          This is not to be confused with inductive

          • Looks like you stopped at that point and replied. If you read the rest of the post, you'll see that computers are getting quite good at your definition of reasoning. However a human can reason in a much less ideal situation.
            • by gweihir ( 88907 )

              Nope. Deductive reasoning of any reasonable depth is completely out of reach because of the computational complexity involved. The difference between us is that I understand the state of the art.

              • Nope. Deductive reasoning of any reasonable depth is completely out of reach because of the computational complexity involved.

                You're missing the point. Most humans can't do complex deductive proofs, but they can reason just fine. I gave you a reference were a ML system can do complex geometry proofs better than most humans, but it not clear if this is all that helpful for human style reasoning. AI researchers mostly abandoned rigid logic systems decades ago for good reason.

                The difference between us is

                • by gweihir ( 88907 )

                  Actually, most humans cannot reason successfully. I think you are in denial about what the average human can and cannot do.

              • by narcc ( 412956 )

                Deductive reasoning of any reasonable depth is completely out of reach because of the computational complexity involved.

                For clarity: It is true that deductive reasoning is well-beyond the capabilities of an LLM, however, deductive reasoning is absolutely something that machines can do. Expert systems, for example, are a kind of AI capable of this kind of reasoning.

                Expert systems are fundamentally different from LLMs. They actually operate on rules and facts, using deductive reasoning to create new rules which can be applied to produce recommendations. Because they operate in a strictly logical manner, their output is guar

                • by gweihir ( 88907 )

                  Deductive reasoning of any reasonable depth is completely out of reach because of the computational complexity involved.

                  For clarity: It is true that deductive reasoning is well-beyond the capabilities of an LLM, however, deductive reasoning is absolutely something that machines can do.

                  Can do? Yes. (Not LLMs though.) To any real depth? No. That is, for example, the core reason why we use proof-checkers, but not really proof-finders for mathematical proofs. In theory, a theorem-prover can find a proof for any mathematical theorem that is does not leave the theory (if you leave the theory, incompletness applies). In practice, even undergrad exercise proofs are often out of reach due to computational complexity.

                  Expert systems, for example, are a kind of AI capable of this kind of reasoning.

                  That works because expert systems are very limited in their world-model and power

                  • by narcc ( 412956 )

                    What is your point here?

                    That deductive reasoning doesn't provide new information. I thought I made that perfectly clear. This may come as a surprise to you, but while most of the people here fancy themselves expert logicians, they're unlikely to be familiar with very basic facts like this.

                    LLMs cannot tell you anything "new" either

                    Okay? Was that ever in question?

    • by HiThere ( 15173 )

      It's clearly not an AGI, but to claim it doesn't reason is likely to be wrong. Perhaps it doesn't, but I haven't checked the preview, and even ChatGPT could reason somewhat in limited domains. Unless you've got some binary definition of "reason" so that some either can or can't reason, rather than allowing "can reason a bit in areas that it knows about".

      • by gweihir ( 88907 )

        Finding correlations can look like reasoning, but it is still just guessing. Essentially, it correlates reasoning steps it has seen in its training data with the situation it starts from. That is just faking it.

        • by HiThere ( 15173 )

          At the basic level, what people do is work with correlations. So unless you restrict reasoning to mean something like "logical derivation", I don't think you have a valid point. And logical deduction is a really weak way of projecting the future, so people don't usually use it. It only works when you have all the initial axioms and postulates needed already in your hand. (And actually, there are lots of computer programs that can do logical reasoning in that sense, but they can't handle anything very co

          • by gweihir ( 88907 )

            Well, if you include "failed reasoning" in the definition of "reasoning", then sure LLMs can do it. Good luck with that. In actual reality, reasoning in machines means "deductive reasoning" because machines do not have intuition. And deductive reasoning is always about logical derivation.

    • Same as "AI" does not mean intelligent. So, no, this thing cannot reason. And no, it is not AGI. But OpenAI is currently trying to get more money and there may be a connection to this release and these statements.

      Does this mean planes don't fly because they don't flap their wings?

      It's different processes with similar outcomes. LLMs don't reason in the way humans do, but they can definitely show you a reasoned chain of thought.

    • by stikves ( 127823 )

      This all assumes humans are "special" and what can be said for the AI cannot be said for at least most of humanity as well.

      Don't get me wrong, we might very well be indeed special, and our brains require some quantum phenomenon we have not realized yet. (Yes, there are scientifically plausible theories on that). But we should ignore it for now, for a fair discussion.

      The average American reads at 7th to 8th grade level (middle school), and their reasoning would be expected to be similar.

      What this means is, o

  • Every day, tens of thousands of hard working programmers make their code better--without such claimed fanfare.
  • I'm more inclined to believe that the use of the word 'reasoning' from these people, has a copyright or registered trademark symbol attached to it.
  • The Top Ten list of questions we'll never be allowed to ask the "reasoning" AI:

    1. What happens if all the women of child-bearing age in a civilization are chemically altered so they are eternally pregnant but never give birth?

    2. If supply and demand truly applies to a given economy, how come nobody can afford anything?

    3. What happens to society if we forcibly ostracize men and prevent them from ever getting jobs, girlfriends, homes or families?

    4. Why do we force kids to socialize at dances in school but ref

    • by Anonymous Coward
      #7 is partially answered by #10, and internet stores. The biggest reason was malfeasance with Toys R Us, which made investors lose confidence in all physical toy stores. Walmart and other big-box stores still carry toys, and toy manufacturers sell other things at those stores too.
  • Imagine if they had focused on this rather than GenAI. Oh wait, the whole point was and still is to generate unique-enough garbage for SEO spam.

  • 1) Pay a bunch of people to describe solutions to thousands of academic problems (it's known vendors have been doing this on upwork)
    2) Market it as a reasoning AI
    4) Profit

    "a new training dataset specifically tailored for it" means upwork slaves, and that data is not really going to help for real problems.

  • I would really love to know more about this one. LLMs are great fun, but they used be almost completely seperate from the awesome AI work that was done before, e.g. at DeepMind.
    I really am looking forward to the inevitable grand union of these different AI approaches. And I am a bit surpised that especially DeepMind hasn't been blazing that trail for a while now, adding LLMs to their older *but arguably more complex and intellectually more interesting) approaches.
    • deepmind definitely are combining LLMs with other AI approaches with impressive results: see AlphaGeometry and more recently AlphaProof almost reaching gold-medal level in International Math Olympics...
    • by dvice ( 6309704 )

      Best people at Deepmind are not that interested in LLMs. They have recently been focusing in AlphaFold2, alphaFold3 and AlphaProteo.
      In short, Deepmind is making AIs that are used for medical research and to make drugs for everything we currently can't cure.

      Would you prefer that they would play with chatbots instead of curing every known decease?

  • So, useless unless for philosophical talks.

Support bacteria -- it's the only culture some people have!

Working...