Forgot your password?
typodupeerror
AI

Anthropic Launches Claude Opus 4.6 as Its AI Tools Rattle Software Markets (anthropic.com) 51

Anthropic on Thursday released Claude Opus 4.6, its most capable model yet, at a moment when the company's AI tools have already spooked markets over fears that they are disrupting traditional software development and other sectors.

The new model improves on Opus 4.5's coding abilities, the company said -- it plans more carefully, sustains longer agentic tasks, handles larger codebases more reliably, and catches its own mistakes through better debugging. It is also the first Opus-class model to feature a 1M token context window, currently in beta.

On GDPval-AA, an independent benchmark measuring performance on knowledge-work tasks in finance, legal and other domains, Opus 4.6 outperformed OpenAI's GPT-5.2 by roughly 144 Elo points. Anthropic also introduced agent teams in Claude Code, allowing multiple agents to work in parallel on tasks like codebase reviews. Pricing remains at $5/$25 per million input/output tokens.
This discussion has been archived. No new comments can be posted.

Anthropic Launches Claude Opus 4.6 as Its AI Tools Rattle Software Markets

Comments Filter:
  • by Kokuyo ( 549451 ) on Thursday February 05, 2026 @02:07PM (#65970706) Journal

    Seriously.

    144 elo points better than ChatGPT? Okay. So how many does ChatGPT get? 112? 1345? 13634? 98123?

    Giving this number would elevate the summary from useless to useful.

    • by Kokuyo ( 549451 ) on Thursday February 05, 2026 @02:09PM (#65970712) Journal

      For anyone interestet, GPT5.2 scored 1462, so we're talking a 10 percent increase in score.

      • GPT5.2 scored 1462, so we're talking a 10 percent increase in score.

        Assuming that the scale isn't logarithmic or some such.

      • by Rei ( 128717 )

        But to be clear to the GP, that doesn't mean "it's a 10% better model". For most queries that one does for any two models, most of the generations / fixes will be "good", and so it's just basically a coin flip as to which model to choose ("I like this one's documentation more", "This one's fix was more concise", "This model was more polite", etc). 10% is actually a pretty big difference and reflect the cases where one model was unambiguously better than the other.

        • But to be clear to the GP, that doesn't mean "it's a 10% better model". For most queries that one does for any two models, most of the generations / fixes will be "good", and so it's just basically a coin flip as to which model to choose ("I like this one's documentation more", "This one's fix was more concise", "This model was more polite", etc). 10% is actually a pretty big difference and reflect the cases where one model was unambiguously better than the other.

          I have a creeping suspicion that people are judging the best model on how much they approve of human responses saved in a database.

          • Ever model is, ultimately, just the distillation of human responses saved in a database (plus some logic).

            Since the details of that distillation have a major impact on what the AI does, it seems like a reasonable thing to care about.

            • Then it is a contemporary search engine based on 40 year old technology degraded for modern times. Vectoring to search multiple possibilities then forming a response is what search engines did very well in the early 90's. You could even use switch arguments to traverse vectors then changing the meaning of the query with arguments and return values. Technically you did not even need to have a human readable search string at all. Search engines degraded in to human staff constantly updating a database of like
    • At the same time aren't these benchmarks useless when they're a target rather than a measure.
      • by gweihir ( 88907 )

        All benchmarks can be gamed. The LLM-scammers have doing this really hard because they have nothing else. And yes, benchmarks become of negative worth (because they begin to state things that are not true) when systems design is aimed at optimizing them.

    • by EvilSS ( 557649 )
      What journalist was involved with this post? It's a single link and it points to the Anthropic website.
    • by ffkom ( 3519199 )
      They certainly do not hate money, they want it funneled into their direction. And since the stock market can only make you gain money that another market participant loses, they are certainly fine with other companies' market capitalization falling.

      And honestly, their value proposition is kind of enticing: "Here we offer you 'plug-ins' to replace your finance/legal/developer/etc. personnel with LLM based bots. Of course we have some cover-your-ass clause written into our offering, that you need to have al
      • by gweihir ( 88907 )

        The funny thing is that these "offers" have been made time and again before. They never worked. They do not really work now. The illusion just has gotten a bit better.

      • "the stock market can only make you gain money that another market participant loses"

        Nope.
        You also make money from a company growing.
    • They're not crashing the market.

      What crashing the market is that what's been supporting the market is the belief that AI was anytime now replace a lot of workers, resulting in massive productivity gains, saving companies a whole lot of money, a share of which would be revenue for AI suppliers instead, so they would make a lot of money.

      Anytime now is the important factor.

      Wall Street has been waiting for 3 years now for a technology that was always sold as "good to go today", but all that's still to see is mo

  • ...and they were amazing. Like I told someone yesterday, two years ago AI was crap. Today, it's not too bad but I wouldn't get on a plane that had its code written by AI. In two years, I'll probably jump on the plane and take a nap.

    • I have also been impressed with Gemini, though I am against many of the use cases and do in fact think AI will have a grossly negative effect on humanity. I can pretty much guarantee that every post here that claims it can't do this, or it is just that, and it is all an illusion, is made by people who haven't even actually used it. They seem to all have no idea that an LLM is a *neural network* and what makes it a Large Language model has to do with how it is trained, not something intrinsic to its interna
      • > every post here that claims it can't do this, or it is just that, and it is all an illusion, is made by people who haven't even actually used it

        Or perhaps they are just too prejudiced to see the reality? I'm an experienced embedded developer, currently working for a world class company making both very expensive and special systems (stuff has to be secure and precise) and also consumer level equipment. Vibe coding I am not. It's coding to a detailed spec. These tools allow me to produce signifi
    • by caseih ( 160668 )

      In my experience Claude is more efficient and better at coding in some languages more than others. For example, C#, Python, anything javascript (react, frameworks, etc), Rust work well. A combination of a lot of source material to steal from, and expressiveness of the languages make Claude more efficient.

      While Claude is quite good at C++ and Qt also, it burns a lot more money there. I think that's because every thing you want to do involves working with two or more files at once (header files, cpp impleme

    • by allo ( 1728082 )

      If it keeps scaling like this, the plane is no problem. Currently LLM continue to get better at a large rate. Both getting better and allowing for smaller models with the same intelligence. The question is, if there is any wall to hit. Currently the main challenge seems to be the quadratic attention, which requires scaling up compute and memory, but is no theoretical limit to what a model can do given enough resources, but there may be other limits we did not find yet.

  • The claims are getting grander, the defects are somewhat better hidden, but the lies are also getting more and more obvious. Might still go on for a year or two but this tech has no big future. It may have a small one, like all Hype-AI tech before (with something like 10 hypes so far, all pretty mich along the same lines as the current one, just smaller), but only if they can bring the computational effort down massively. That seems to not be happening.

    • Name the various AI models and versions, as well as the use cases, and how it didn't work and why. Otherwise you are just a guy spouting the same anti-AI line you have been spouting for decades having absolutely no idea what you are talking about.
      • by gweihir ( 88907 )

        Sooo, the "anti-AI line you have been spouting for decades"? What drugs are you on? The hallucinations are extreme.

        Well, I get you have absolutely nothing and you are clearly not very smart. My condolences.

        • So you are saying you *haven't* been completely dismissive of AI for decades. I'm smart enough to have a very, very good memory. The internet also has such a memory. ... here is what AI knows about you [google.com]. Anyone who wants to look deeper will see that you have been spewing the same lack of understanding for as long as I can recall, which again, is a very, very long time.
          • by gweihir ( 88907 )

            That misunderstanding comes from lack of insight on your side. Because you have no clue how things work, you lump everything together into "the one anti-AI line". That is not what I am doing, and it takes a very limited mind to see it as such.

            What is actually happening is that I have been following the developments in the I field for something like 35 years now, because I was thinking about doing my PhD in that are. But the constant lying, overstating and general dishonesty about actual mechanism capabiliti

            • Only 35 years? That is about how long we have been on Slashdot. I had already been a Marvin Minsky and Jaron Lanier enthusiast by the time you started "following" it for a good decade. Nobody was "lying" about AI, and you have never once had a good thing to say about AI. You aren't even smart enough to figure out that humans make mistakes, hallucinate, and draw false inferences all the time, and every time you say it isn't intelligent because it does so it just means you have far more in common with AI th
    • Only a boomer could be so repeatedly wrong and still convince themselves that they know better than everyone else.

    • I work at a major tech company and we have access to several models with unlimited tokens. Nearly everyone chooses Claude and uses it daily now. I wouldn't be surprised if the favorite model changes, but anyone who thinks this technology isn't the future is toast. English is going to be the new language for writing code, and hand writing Python, Rust or C++ is going to be like writing assembly. Respected and in rare circumstances useful, but extremely niche. That's been the direction of the abstractions for
  • DGAF about benchmarks, I recall germini beat GPT and that Chineze bazinga also performed "very well" on paper. My personal hall of fame: Sonnet - a good work horse, might act a bit silly at times, but works very well if you are setting the context boundaries well. GPT (generic) - a more capable at cracking the harder tasks. (e.g. "let's grab Oracle driver in go and patch it to get to streaming blobs). I find Sonnet's code to be more readable. Opus - I don't get the hype. 3x cost... for what? When Sonnet
  • pets.com, etc., etc.
          January 2000: Superbowl ad.
          November 2000: Worthless.

    Anthropic, Open AI, etc., etc.
            January 2026: Superbowl ad.
            November 2026: Even that's optimistic.

    Sell NFTs ! Sell Bitcoin ! Sell Gold ! Buy AI !
  • Everyone was freaking out about automation in the 90s, but it turned into a boon for tech workers.

    AI will ultimately displace millions or billions of human roles. I'm not sure if the human roles of the future will be of the lucrative tech variety from the past few decades.

  • I get the strong impression that people posting here do not use these tools.

    I routinely code with AI assistance, and I can choose from 2-3 dozen models as I go. The cost per prompt ranges between free to a nickel or two. In general you can get a lot of work done for you for under 50 cents. People working in corporate environments get team-level access for a few $hundred/month and can burn through all the prompts they want.

    Claude Opus 4.5 has been the premier coding tool for the past several months by far. T

  • by echo123 ( 1266692 ) on Friday February 06, 2026 @11:48AM (#65972880)

    Several weeks ago I met a guy in an airport lounge bar and we compared AI notes between us. He turned me on to Microsoft Githhub Copilot Enterprise 'agentic teams' which I've been focused on since. I knew it was only going to be a short period of time before the other AI companies offered similar tech and now I see Anthropic is doing just that, (disclaimer, I only read TFA and linked-to article, and I have no experience with this new stuff yet).

    Last few months I've purchased Jetbrains Junie. This month I purchased Microsoft Githhub Copilot Enterprise instead. Both provide access to different AI models like Anthropic's, ChatGPT's, Google's, etc. for their monthly prices.

    To use all of the features of Github Copilot, you must have a Microsoft Githhub Copilot Enterprise account with at least one Organization, which costs $39/seat, which is the same price as a simple developer account without all the features. Thing is: from my perspective those features are critical to begin with, period. One of the great values of Microsoft Githhub Copilot Enterprise is how well it works with GIT, and if you do serious work with GIT repos and want to use the new AI stuff, you'll need a Microsoft Githhub Copilot Enterprise which cost the same as a simple dev account, and is a royal pain to setup for a sole dev.

    Creating and configuring a Microsoft Github Copilot Enterprise account for the 2 days it took to get right felt like enduring a long hike through thick stinky swamp water on a cold, rainy night. It does work but I haven't used it long enough to judge quality or time savings. It's certainly comparable to JetBrains Junie ($30/monthly) which lacks agentic coding teams, unless today's announcement changes that somehow.

    I do like specialized *.agent.md files and how they hand-off tasks to each other and create GIT Pull-Requests for my review.

    I found these articles about Microsoft Githhub Copilot Enterprise useful:

    A mission control to assign, steer, and track Copilot coding agent tasks [github.blog]

    Planning a project with GitHub Copilot [github.com]

    How to orchestrate agents using mission control" [github.blog]

Measure twice, cut once.

Working...