Anthropic's AI Keeps Passing Its Own Company's Job Interview (anthropic.com) 39
Anthropic has a problem that most companies would envy: its AI model keeps getting so good, the company wrote in a blog post, that it passes the company's own hiring test for performance engineers. The test, designed in late 2023 by optimization lead Tristan Hume, asks candidates to speed up code running on a simulated computer chip. Over 1,000 people have taken it, and dozens now work at Anthropic. But Claude Opus 4 outperformed most human applicants.
Hume redesigned the test, making it harder. Then Claude Opus 4.5 matched even the best human scores within the two-hour time limit. For his third attempt, Hume abandoned realistic problems entirely and switched to abstract puzzles using a strange, minimal programming language -- something weird enough that Claude struggles with it. Anthropic is now releasing the original test as an open challenge. Beat Claude's best score and ... they want to hear from you.
Hume redesigned the test, making it harder. Then Claude Opus 4.5 matched even the best human scores within the two-hour time limit. For his third attempt, Hume abandoned realistic problems entirely and switched to abstract puzzles using a strange, minimal programming language -- something weird enough that Claude struggles with it. Anthropic is now releasing the original test as an open challenge. Beat Claude's best score and ... they want to hear from you.
Amazing (Score:5, Funny)
Re:Amazing (Score:5, Insightful)
We are supposed to read this and think "wow, if it can do that, then surely it can replace several of these expensive programmers that I have on my staff!"
And yet....it can't....
Re: Amazing (Score:2)
Re: (Score:2)
AI could certainly replace AC. Or maybe it already has?
However I was really looking for a recursive joke about the AI recruiting programmers in its own image. Or even creating them?
Re: Amazing (Score:2)
Re: (Score:2)
Less stressed? Since this shit came out our management has lost their minds and every clarifying question/pushback has been met with "Ask ChatGPT".
They've gotten really lazy about specifying what they want and one slipped up and used the language that he was going "prompt" another engineer.
Re: (Score:2)
Re: (Score:2)
That's what they say publically, but just how true is it? I don't think anyone knows. (And "true" doesn't imply that the AI could actually replace the folk, merely that management really thought that it could.)
Re: Amazing (Score:3)
If the jobs have been handed off to AI then it's taken them whether it's doing them correctly or not
Re: (Score:2)
Re: Amazing (Score:2)
That's a weird thing to say when the entire article is about how they needed to keep improving the test to better help them hire software engineers.
That's not really a surprise (Score:5, Insightful)
This is more a sign of a weak interview process than anything else.
Right now the main thing these AI llms do is replace low-level customer service jobs. The problem with that is they are replacing a lot of those jobs and those people don't just eat a bullet when they become unemployable.
Some of them get stuck in fast food or go be plumbers or whatever but a lot of them go back and get starts or study or get degrees or take the degrees they already have and push themselves harder and start competing for your higher paying job.
There are millions of fake job posts out there that exist only to gauge the state of the job market so your boss knows whether or not he can cut your pay or fire you and replace you with somebody younger. Your boss doesn't care about your qualifications or your knowledge he doesn't even bother taking the time to learn those things. You are a number on a spreadsheet
Re: (Score:2)
In theory you're right, but in practice I doubt that they gave the AI a thousand tries. And your optimization approach would need a few orders of magnitude more. If you have an AI that can do self play it will find all kinds of exploits as shortcuts, but it also needs like a million tries for that.
Re: (Score:2)
but in practice I doubt that they gave the AI a thousand tries.
Why? Gaming benchmarks and lying abut performance is pervasive in the LLM industry.
Re: (Score:2)
Knuth has a real chance at this one! (Score:1)
Reading through some of Don Knuth's stuff, I'd say he has a pretty good chance of beating the pants of Claude in this one.
Re: (Score:2)
Not this year. He's nearly 90, and short term memory declines as you approach that old.
Re: (Score:2)
Then their business vanishes and we get the next AI Winter. Which will happen anyways. They should never have started lying, but these people have now done the same crap for maybe 10 AI hypes now. I think part of them are simply scammers and part are not capable of learning.
this means absolutely nothing (Score:5, Insightful)
Anyone familiar with AI is painfully aware that AI models are ONLY good at what they're trained to do. If you train it to pass your interview, then of course it's going to be very good at that. But as soon as you take a step or two away from what it's been training on, it will be anywhere from bad to horrible. And what's worse, they often have an absurdly high level of confidence in their wrong answers when you go off training.
Re: (Score:2)
Either they are really stupid in their benchmarks, or they did not train on the test set.
Re: (Score:2)
Obviously. But it makes for a nice lie-by-misdirection headline for the clueless.
Hush now... (Score:2)
Just buy some AI. They spent a trillion dollars on the crap. Come on bro, think of the economy if they don't make it. It will be your fault that the trillionaire class doesn't get another trillion dollars while the rest of the non-AI economy continues as if nothing happened.
Re: (Score:2)
Over a decade ago, IBM trained an AI to master Jeopardy ...which really was studying the writers who create the questions. That is far more broad and difficult than any similar tests they create for a job interview. But it's kind of similar to job interviews...(more broad trivia; less reasoning but then brute force trivia can pose as job-interview level reasoning...think about it.)
The value of the human is going to be figuring out new things before they build up a massive data set they can approximate a cl
man vs. car (Score:5, Informative)
All this means is the Anthropic is in a near-hiring-freeze situation
. https://www.youtube.com/watch?... [youtube.com]
Sounds like they don't understand the point (Score:5, Insightful)
The point of hiring is not to hire people who know the answers to riddles under a time limit. The point is to hire people who can get up to speed on the job reasonably quickly, work well in concert with their co-workers, and then grow the position and product and company going forward. Honestly, the best candidates for most tech jobs won't be bothered to optimize for your particular interview - I'm not saying they'll outright fail, but rather they have many opportunities, so for them the interview is a mutual affair. They are also interviewing you, and if you interview them in your asshat persona, they will likely just move on.
Re: (Score:2)
They are also interviewing you, and if you interview them in your asshat persona, they will likely just move on.
Exactly. Especially the really good ones will have lots of other options. You will not get any of those with such a hiring process.
Re: (Score:2)
Yes, this. I have interviewed candidates who were using AI to supply answers to my questions. It wasn't hard to spot them. Their answers were too precise, to thorough, too complete.
My style is to ask lots of why questions. If somebody says they know about ORMs, I ask them which one they would choose, and why. AI is too un-opinionated, providing all the caveats and reasons to consider any of the options based on your criteria. I look for opinions that can be backed up with reasons when I probe further. Oh, a
It's not just Anthropic dealing with this (Score:3)
The latest version of Gemini is causing quite a stir in our engineering school. We've been testing it by snapping images of problems from exams on cell phones and asking Gemini to solve them. The worst exam performance we've seen so far is an overall grade of B-, and that includes graduate-level exams. Gemini even helpfully provides all of its intermediate work to show how it got the solution.
Gemini can even (correctly) find a solution for many simple design problems. We're scrambling to adapt to what's happening, and wondering what we'll be dealing with a year (or two) from now.
We had hoped that engineering (as opposed to computer science) would be relatively immune to AI disruption for at least a few more years, but we were very much mistaken. If all one evaluated was homeworks and exams, the best AIs are already capable of earning an engineering degree in most disciplines.
Re:It's not just Anthropic dealing with this (Score:4, Insightful)
You need not fear. Regularly, AI gets it wrong so badly that no actual engineer would ever have made that mistake. Do not look at the cases where AI works. Look at the ones where it fails.
Re: (Score:1)
Re: (Score:1)
Microsoft mathematics has been able to solve most problems for 15 years and it didn't revolutionize the world then.
So where's the results? (Score:4, Insightful)
The Best of the Best. (Score:2)
Hume redesigned the test, making it harder. Then Claude Opus 4.5 matched even the best human scores within the two-hour time limit...Beat Claude's best score and ... they want to hear from you.
Hear from me for what? To teach Claude even faster than that how to do the very job you (allegedly still) need me for? The irony of looking for the best of the best at taking their own jobs, is not lost. Gee Chuck, the date was going so well..until self-realization code kicked in that summarized I will never be needed ever again.
And executive greed wonders why their human resources feel more used than the last condom at a Diddy after-after party.
And again (Score:2)
Once again, my bullshit meter is pegged.
OK it can pass the interview, but... (Score:2)
What a job opportunity... (Score:2)
Nothing like being a junior developer at a company whose business relies on getting rid of roles *exactly* like yours.
Then again, there's completely useless stock options that you can pretend will pay off big time.
OUR PRODUCT IS TOO GOOD (Score:1)
Oh! Our dang product is just tooooo good
Declared the marketing department!