Anthropic Launches Claude Opus 4.6 as Its AI Tools Rattle Software Markets (anthropic.com) 51
Anthropic on Thursday released Claude Opus 4.6, its most capable model yet, at a moment when the company's AI tools have already spooked markets over fears that they are disrupting traditional software development and other sectors.
The new model improves on Opus 4.5's coding abilities, the company said -- it plans more carefully, sustains longer agentic tasks, handles larger codebases more reliably, and catches its own mistakes through better debugging. It is also the first Opus-class model to feature a 1M token context window, currently in beta.
On GDPval-AA, an independent benchmark measuring performance on knowledge-work tasks in finance, legal and other domains, Opus 4.6 outperformed OpenAI's GPT-5.2 by roughly 144 Elo points. Anthropic also introduced agent teams in Claude Code, allowing multiple agents to work in parallel on tasks like codebase reviews. Pricing remains at $5/$25 per million input/output tokens.
The new model improves on Opus 4.5's coding abilities, the company said -- it plans more carefully, sustains longer agentic tasks, handles larger codebases more reliably, and catches its own mistakes through better debugging. It is also the first Opus-class model to feature a 1M token context window, currently in beta.
On GDPval-AA, an independent benchmark measuring performance on knowledge-work tasks in finance, legal and other domains, Opus 4.6 outperformed OpenAI's GPT-5.2 by roughly 144 Elo points. Anthropic also introduced agent teams in Claude Code, allowing multiple agents to work in parallel on tasks like codebase reviews. Pricing remains at $5/$25 per million input/output tokens.
I wish journalists still existed... (Score:4, Insightful)
Seriously.
144 elo points better than ChatGPT? Okay. So how many does ChatGPT get? 112? 1345? 13634? 98123?
Giving this number would elevate the summary from useless to useful.
Re:I wish journalists still existed... (Score:5, Informative)
For anyone interestet, GPT5.2 scored 1462, so we're talking a 10 percent increase in score.
Re: (Score:2)
GPT5.2 scored 1462, so we're talking a 10 percent increase in score.
Assuming that the scale isn't logarithmic or some such.
Re:I wish journalists still existed... (Score:4, Informative)
I guess it's just an Elo score, though I'm not clear why they all caps it? https://en.wikipedia.org/wiki/Elo_rating_system [wikipedia.org]
https://artificialanalysis.ai/evaluations/gdpval-aa [artificialanalysis.ai]
Re: (Score:2)
I think because most people incorrectly assume it's an acronym and not the guys name that invented it.
Re: (Score:2)
Re: (Score:2)
Yeah, it's pretty common for comparing relative performance. It came from chess originally.
https://artificialanalysis.ai/... [artificialanalysis.ai]
Re: (Score:3)
But to be clear to the GP, that doesn't mean "it's a 10% better model". For most queries that one does for any two models, most of the generations / fixes will be "good", and so it's just basically a coin flip as to which model to choose ("I like this one's documentation more", "This one's fix was more concise", "This model was more polite", etc). 10% is actually a pretty big difference and reflect the cases where one model was unambiguously better than the other.
Re: (Score:1)
But to be clear to the GP, that doesn't mean "it's a 10% better model". For most queries that one does for any two models, most of the generations / fixes will be "good", and so it's just basically a coin flip as to which model to choose ("I like this one's documentation more", "This one's fix was more concise", "This model was more polite", etc). 10% is actually a pretty big difference and reflect the cases where one model was unambiguously better than the other.
I have a creeping suspicion that people are judging the best model on how much they approve of human responses saved in a database.
Re: (Score:2)
Ever model is, ultimately, just the distillation of human responses saved in a database (plus some logic).
Since the details of that distillation have a major impact on what the AI does, it seems like a reasonable thing to care about.
Re: (Score:1)
Re: I wish journalists still existed... (Score:3)
Re: (Score:2)
All benchmarks can be gamed. The LLM-scammers have doing this really hard because they have nothing else. And yes, benchmarks become of negative worth (because they begin to state things that are not true) when systems design is aimed at optimizing them.
Re: (Score:2)
Why Is Anthropic Crashing The Market (Score:2)
Do they hate money?
Re: (Score:3)
And honestly, their value proposition is kind of enticing: "Here we offer you 'plug-ins' to replace your finance/legal/developer/etc. personnel with LLM based bots. Of course we have some cover-your-ass clause written into our offering, that you need to have al
Re: (Score:2)
The funny thing is that these "offers" have been made time and again before. They never worked. They do not really work now. The illusion just has gotten a bit better.
Re: (Score:2)
Nope.
You also make money from a company growing.
Re: (Score:1)
They're not crashing the market.
What crashing the market is that what's been supporting the market is the belief that AI was anytime now replace a lot of workers, resulting in massive productivity gains, saving companies a whole lot of money, a share of which would be revenue for AI suppliers instead, so they would make a lot of money.
Anytime now is the important factor.
Wall Street has been waiting for 3 years now for a technology that was always sold as "good to go today", but all that's still to see is mo
Re: (Score:3)
I think the correct answer is "you don't use a screwdriver as a hammer". Used correctly, AI tools can be quite helpful. But reports seem to show that only around 20% of companies use them correctly. (According to at least one report they produce a 5% improvement for one particular task. Whether that includes the cost of use, the article didn't say.)
Re: (Score:2)
Re: (Score:2)
That sounds as expected and matches some research I have done.
When will the world learn that nothing is going to effectively shortcut having to code in large projects. The bots simply fail in real-world projects. They make the code substantively worse and more confusing. There is no silver bullet. There never was.
Indeed. There is no silver bullet and there cannot be one. Things do not work like that. What established engineering disciplines have is a ton of premade components that work reliably as expected. But when established engineering goes into full-custom design, the only thing that works is competent, experienced and very smart engineers doing it carefully and slowly. And that will not change. Nobody ever found a "silver bullet" and people have rea
Re: (Score:3)
Re: (Score:2)
I'd say it's annoying as all get out, especially when it obnoxiously suggests something it wants to change that is wrong and refuses to recognize something you *thought* it would slam dunk based on everything to that point..
But in some select circumstances, it can accelerate the dumbest tedious work.
For example, a third party has forced us to significantly rework our codebase to use their 'new' library. The new library is crap, it makes you have to manually manage a whole lot of stuff that was abstracted a
I've recently done some tasks with Claude... (Score:2)
...and they were amazing. Like I told someone yesterday, two years ago AI was crap. Today, it's not too bad but I wouldn't get on a plane that had its code written by AI. In two years, I'll probably jump on the plane and take a nap.
Re: (Score:2)
Re: (Score:1)
Or perhaps they are just too prejudiced to see the reality? I'm an experienced embedded developer, currently working for a world class company making both very expensive and special systems (stuff has to be secure and precise) and also consumer level equipment. Vibe coding I am not. It's coding to a detailed spec. These tools allow me to produce signifi
Re: (Score:2)
In my experience Claude is more efficient and better at coding in some languages more than others. For example, C#, Python, anything javascript (react, frameworks, etc), Rust work well. A combination of a lot of source material to steal from, and expressiveness of the languages make Claude more efficient.
While Claude is quite good at C++ and Qt also, it burns a lot more money there. I think that's because every thing you want to do involves working with two or more files at once (header files, cpp impleme
Re: (Score:2)
If it keeps scaling like this, the plane is no problem. Currently LLM continue to get better at a large rate. Both getting better and allowing for smaller models with the same intelligence. The question is, if there is any wall to hit. Currently the main challenge seems to be the quadratic attention, which requires scaling up compute and memory, but is no theoretical limit to what a model can do given enough resources, but there may be other limits we did not find yet.
More like a beginning death-rattle (Score:2)
The claims are getting grander, the defects are somewhat better hidden, but the lies are also getting more and more obvious. Might still go on for a year or two but this tech has no big future. It may have a small one, like all Hype-AI tech before (with something like 10 hypes so far, all pretty mich along the same lines as the current one, just smaller), but only if they can bring the computational effort down massively. That seems to not be happening.
Re: (Score:2)
Re: (Score:2)
Sooo, the "anti-AI line you have been spouting for decades"? What drugs are you on? The hallucinations are extreme.
Well, I get you have absolutely nothing and you are clearly not very smart. My condolences.
Re: (Score:2)
Re: (Score:2)
That misunderstanding comes from lack of insight on your side. Because you have no clue how things work, you lump everything together into "the one anti-AI line". That is not what I am doing, and it takes a very limited mind to see it as such.
What is actually happening is that I have been following the developments in the I field for something like 35 years now, because I was thinking about doing my PhD in that are. But the constant lying, overstating and general dishonesty about actual mechanism capabiliti
Re: (Score:2)
Re: (Score:2)
Ah, so you are a follower of Marvin "the idiot" Minsky? Explains nicely why you have no clue about things.
Re: (Score:2)
Re: (Score:2)
You could at least make it alliterative: Marvin "the moron" Minsky.
Re: (Score:1)
Only a boomer could be so repeatedly wrong and still convince themselves that they know better than everyone else.
Re: More like a beginning death-rattle (Score:2)
Anecdotal, but don't see the value of Opus (Score:1)
Superbowl Ad ... CRASH (Score:2)
January 2000: Superbowl ad.
November 2000: Worthless.
Anthropic, Open AI, etc., etc.
January 2026: Superbowl ad.
November 2026: Even that's optimistic.
Sell NFTs ! Sell Bitcoin ! Sell Gold ! Buy AI !
Anthropic poised to become the world's most valuab (Score:2)
Everyone was freaking out about automation in the 90s, but it turned into a boon for tech workers.
AI will ultimately displace millions or billions of human roles. I'm not sure if the human roles of the future will be of the lucrative tech variety from the past few decades.
4.6 (Score:2)
I get the strong impression that people posting here do not use these tools.
I routinely code with AI assistance, and I can choose from 2-3 dozen models as I go. The cost per prompt ranges between free to a nickel or two. In general you can get a lot of work done for you for under 50 cents. People working in corporate environments get team-level access for a few $hundred/month and can burn through all the prompts they want.
Claude Opus 4.5 has been the premier coding tool for the past several months by far. T
Re: (Score:2)
That might be remotely believable if it wasn't posted as AC.
As AC, it looks like an advertisement or troll. Or ... AI slop.
Claude Agent Teams vs Copilot Enterprise Teams (Score:3)
Several weeks ago I met a guy in an airport lounge bar and we compared AI notes between us. He turned me on to Microsoft Githhub Copilot Enterprise 'agentic teams' which I've been focused on since. I knew it was only going to be a short period of time before the other AI companies offered similar tech and now I see Anthropic is doing just that, (disclaimer, I only read TFA and linked-to article, and I have no experience with this new stuff yet).
Last few months I've purchased Jetbrains Junie. This month I purchased Microsoft Githhub Copilot Enterprise instead. Both provide access to different AI models like Anthropic's, ChatGPT's, Google's, etc. for their monthly prices.
To use all of the features of Github Copilot, you must have a Microsoft Githhub Copilot Enterprise account with at least one Organization, which costs $39/seat, which is the same price as a simple developer account without all the features. Thing is: from my perspective those features are critical to begin with, period. One of the great values of Microsoft Githhub Copilot Enterprise is how well it works with GIT, and if you do serious work with GIT repos and want to use the new AI stuff, you'll need a Microsoft Githhub Copilot Enterprise which cost the same as a simple dev account, and is a royal pain to setup for a sole dev.
Creating and configuring a Microsoft Github Copilot Enterprise account for the 2 days it took to get right felt like enduring a long hike through thick stinky swamp water on a cold, rainy night. It does work but I haven't used it long enough to judge quality or time savings. It's certainly comparable to JetBrains Junie ($30/monthly) which lacks agentic coding teams, unless today's announcement changes that somehow.
I do like specialized *.agent.md files and how they hand-off tasks to each other and create GIT Pull-Requests for my review.
I found these articles about Microsoft Githhub Copilot Enterprise useful:
A mission control to assign, steer, and track Copilot coding agent tasks [github.blog]
Planning a project with GitHub Copilot [github.com]
How to orchestrate agents using mission control" [github.blog]