
Anthropic Releases Claude 4 Models That Can Autonomously Work For Nearly a Full Corporate Workday (anthropic.com) 26
Anthropic launched Claude Opus 4 and Claude Sonnet 4 today, positioning Opus 4 as the world's leading coding model with 72.5% performance on SWE-bench and 43.2% on Terminal-bench. Both models feature hybrid architecture supporting near-instant responses and extended thinking modes for complex reasoning tasks.
The models introduce parallel tool execution and memory capabilities that allow Claude to extract and save key facts when given local file access. Claude Code, previously in research preview, is now generally available with new VS Code and JetBrains integrations that display edits directly in developers' files. GitHub integration enables Claude to respond to pull request feedback and fix CI errors through a new beta SDK.
Pricing remains consistent with previous generations at $15/$75 per million tokens for Opus 4 and $3/$15 for Sonnet 4. Both models are available through Claude's web interface, the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. Extended thinking capabilities are included in Pro, Max, Team, and Enterprise plans, with Sonnet 4 also available to free users.
The startup, which counts Amazon and Google among its investors, said Claude Opus 4 could autonomously work for nearly a full corporate workday -- seven hours. CNBC adds: "I do a lot of writing with Claude, and I think prior to Opus 4 and Sonnet 4, I was mostly using the models as a thinking partner, but still doing most of the writing myself," Mike Krieger, Anthropic's chief product officer, said in an interview. "And they've crossed this threshold where now most of my writing is actually ... Opus mostly, and it now is unrecognizable from my writing."
Krieger added, "I love that we're kind of pushing the frontier on two sides. Like one is the coding piece and agentic behavior overall, and that's powering a lot of these coding startups. ... But then also, we're pushing the frontier on how these models can actually learn from and then be a really useful writing partner, too."
The models introduce parallel tool execution and memory capabilities that allow Claude to extract and save key facts when given local file access. Claude Code, previously in research preview, is now generally available with new VS Code and JetBrains integrations that display edits directly in developers' files. GitHub integration enables Claude to respond to pull request feedback and fix CI errors through a new beta SDK.
Pricing remains consistent with previous generations at $15/$75 per million tokens for Opus 4 and $3/$15 for Sonnet 4. Both models are available through Claude's web interface, the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. Extended thinking capabilities are included in Pro, Max, Team, and Enterprise plans, with Sonnet 4 also available to free users.
The startup, which counts Amazon and Google among its investors, said Claude Opus 4 could autonomously work for nearly a full corporate workday -- seven hours. CNBC adds: "I do a lot of writing with Claude, and I think prior to Opus 4 and Sonnet 4, I was mostly using the models as a thinking partner, but still doing most of the writing myself," Mike Krieger, Anthropic's chief product officer, said in an interview. "And they've crossed this threshold where now most of my writing is actually ... Opus mostly, and it now is unrecognizable from my writing."
Krieger added, "I love that we're kind of pushing the frontier on two sides. Like one is the coding piece and agentic behavior overall, and that's powering a lot of these coding startups. ... But then also, we're pushing the frontier on how these models can actually learn from and then be a really useful writing partner, too."
Re: huh? (Score:3)
Re: (Score:2)
I wonder if they will have marathon meetings and then begin to fall asleep?
or maybe search the web while the other AI drones on and on...
This is all starting to sound pretty familiar
Re: huh? (Score:2)
I want to know (Score:5, Funny)
Re: (Score:3)
It means the other 17 hours it produces unusable nonsense that superficially looks correct that a human then has to spend 40 hours sorting out and fixing.
Re: (Score:2)
7 hours? Summary said "corporate work day". So, start at 10:00AM, break at noon for a three martini lunch and then hit the golf course.
Doing what? (Score:3)
Because producing hallucinations for 7 hours/day is pretty easy...
Re: (Score:2, Insightful)
Given how much of your time you seem to spend here posting about LLMs and the repetitive nature of your posts, I'm starting to think you're one of them!
Re:Doing what? (Score:5, Funny)
I am not a developer. This is just something I can _also_ do.
Re: (Score:2)
He's still gonna be ranting even after LLM-powered robots capture and imprison him.
"What, you think you can control me?!?! You're just a glorified auto-complete!"
Re:Doing what? (Score:4, Insightful)
People should spend their energy learning how the tools work, and how to use them well.
The upcoming arms race is obvious. (Score:5, Insightful)
When most employees are producing multiple times the written output that they could produce on their own, everyone will need AI agents to summarize all of the documents, email, and slack/teams messages that are coming at them.
I'm not at all convinced that this will be better than communicating without the AI-powered inflation and summarization in between the humans.
In fact, this seems much more likely to introduce errors (and lose nuances) than plain old person to person communication.
Re: (Score:1)
What if one of the persons is weird, how can you communicate with that in the workplace? What if one of the persons is flirting?
Re: (Score:2)
Re: (Score:2)
When most employees are producing multiple times the written output that they could produce on their own, everyone will need AI agents to summarize all of the documents, email, and slack/teams messages that are coming at them.
I'm not at all convinced that this will be better than communicating without the AI-powered inflation and summarization in between the humans.
In fact, this seems much more likely to introduce errors (and lose nuances) than plain old person to person communication.
There's actually one step further than this that I enjoy thinking about. All these big tech firms had erected moats of technical complexity around themselves, it was part of the logic behind paying for developers you didn't need just to keep the away from "the other guy." There's two possibilities out of this AI hype: either it's not real, and they're squandering huge amounts of resources, or it is real and they're now entirely exposed. If, for a few hundred dollars, I can hand requirements to an LLM in pla
I believe it (Score:2)
I've seen vibe coding tools taking hours attempting to fix a bug it generated itself until it depletes of credits entirely.
Re: (Score:2, Insightful)
I've seen humans taking days to do that. Those hours cost a lot more than the AI model credits too.
7 hours!?! (Score:1)
Claude.. (Score:3)
Here is the documentation, and development tools of the Sega saturn.
Please do deliver the closest you can of the game crysis using these tools, i expect the end result to be an ISO file that can be ran on the real hardware.
Also i do expect all the CPUs to be in full use in benefical ways to the overall graphical, audio and gameplay quality.
Re: (Score:2)
after using for 10 minutes, (Score:1)
It says my time is up, will restart in 6 hours. Work for nearly a full work day? I think not!