Anthropic's AI Keeps Passing Its Own Company's Job Interview (anthropic.com) 39

Posted by msmash on Friday January 23, 2026 @12:01PM from the too-smart-for-its-own-good dept.

Anthropic has a problem that most companies would envy: its AI model keeps getting so good, the company wrote in a blog post, that it passes the company's own hiring test for performance engineers. The test, designed in late 2023 by optimization lead Tristan Hume, asks candidates to speed up code running on a simulated computer chip. Over 1,000 people have taken it, and dozens now work at Anthropic. But Claude Opus 4 outperformed most human applicants.

Hume redesigned the test, making it harder. Then Claude Opus 4.5 matched even the best human scores within the two-hour time limit. For his third attempt, Hume abandoned realistic problems entirely and switched to abstract puzzles using a strange, minimal programming language -- something weird enough that Claude struggles with it. Anthropic is now releasing the original test as an open challenge. Beat Claude's best score and ... they want to hear from you.

Anthropic's AI Keeps Passing Its Own Company's Job Interview

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 39 Comments Log In/Create an Account

Comments Filter:

Amazing (Score:5, Funny)

by Inglix the Mad ( 576601 ) writes: on Friday January 23, 2026 @12:10PM (#65944240)

Their in house system is good at solving problems they write! INCONCEIVABLE!

- Re:Amazing (Score:5, Insightful)
  
  by Brain-Fu ( 1274756 ) writes: on Friday January 23, 2026 @12:12PM (#65944250) Homepage Journal
  
  We are supposed to read this and think "wow, if it can do that, then surely it can replace several of these expensive programmers that I have on my staff!"
  And yet....it can't....
  
  - - Re: Amazing (Score:2)
      
      by liqu1d ( 4349325 ) writes:
      
      You cannot be sincere... the tool is aimed to improve your productivity up until the point productivity no longer includes you.
      - Re: (Score:2)
        
        by shanen ( 462549 ) writes:
        
        AI could certainly replace AC. Or maybe it already has?
        However I was really looking for a recursive joke about the AI recruiting programmers in its own image. Or even creating them?
    - Re: Amazing (Score:2)
      
      by dhjdhj ( 1355079 ) writes:
      
      Noâ¦it SHOULD be tool to help with productivity (and it can certainly be helpful) but it does not replace expertise. Itâ(TM)s just that CEOs have been led to believe it can!
    - Re: (Score:2)
      
      by FunkSoulBrother ( 140893 ) writes:
      
      Less stressed? Since this shit came out our management has lost their minds and every clarifying question/pushback has been met with "Ask ChatGPT".
      They've gotten really lazy about specifying what they want and one slipped up and used the language that he was going "prompt" another engineer.
    - Re: (Score:2)
      
      by Rujiel ( 1632063 ) writes:
      
      Are you a time-traveller from before every major tech company used AI as a justification for their layoffs?
      - Re: (Score:2)
        
        by HiThere ( 15173 ) writes:
        
        That's what they say publically, but just how true is it? I don't think anyone knows. (And "true" doesn't imply that the AI could actually replace the folk, merely that management really thought that it could.)
        
        Re: Amazing (Score:3)
        
        by drinkypoo ( 153816 ) writes:
        
        If the jobs have been handed off to AI then it's taken them whether it's doing them correctly or not
        
        Re: (Score:2)
        
        by Rujiel ( 1632063 ) writes:
        
        I definitely don't believe them that AI was the main motivator or that it has successfully filled the gaps, but the intended results are clear even if the motivation was questionable.
  - Re: Amazing (Score:2)
    
    by SirSlud ( 67381 ) writes:
    
    That's a weird thing to say when the entire article is about how they needed to keep improving the test to better help them hire software engineers.
That's not really a surprise (Score:5, Insightful)

by rsilvergun ( 571051 ) writes: on Friday January 23, 2026 @12:11PM (#65944242)

The AI is going to quickly figure out the words needed to say to get through a job interview in the same way that a simple machine learning algorithm can learn how to beat world 1 1 of super Mario Bros. It tries the interview and fails and then it tries something else and fails and it does that a whole bunch of times until it succeeds and then it figures out what the interviewer is looking for.

This is more a sign of a weak interview process than anything else.

Right now the main thing these AI llms do is replace low-level customer service jobs. The problem with that is they are replacing a lot of those jobs and those people don't just eat a bullet when they become unemployable.

Some of them get stuck in fast food or go be plumbers or whatever but a lot of them go back and get starts or study or get degrees or take the degrees they already have and push themselves harder and start competing for your higher paying job.

There are millions of fake job posts out there that exist only to gauge the state of the job market so your boss knows whether or not he can cut your pay or fire you and replace you with somebody younger. Your boss doesn't care about your qualifications or your knowledge he doesn't even bother taking the time to learn those things. You are a number on a spreadsheet

- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  In theory you're right, but in practice I doubt that they gave the AI a thousand tries. And your optimization approach would need a few orders of magnitude more. If you have an AI that can do self play it will find all kinds of exploits as shortcuts, but it also needs like a million tries for that.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    but in practice I doubt that they gave the AI a thousand tries.
    Why? Gaming benchmarks and lying abut performance is pervasive in the LLM industry.
  - Re: (Score:2)
    
    by rsilvergun ( 571051 ) writes:
    
    They don't have to give it a thousand tries. You just feed it the results of successful interviews and the machine learning does the rest. I used world 1 1 is an example because it's probably one of the most famous training examples out there but you don't need to necessarily let the AI play the game. It can learn on after the fact training data
Knuth has a real chance at this one! (Score:1)

by guardian-ct ( 105061 ) writes:

Reading through some of Don Knuth's stuff, I'd say he has a pretty good chance of beating the pants of Claude in this one.
- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  Not this year. He's nearly 90, and short term memory declines as you approach that old.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Then their business vanishes and we get the next AI Winter. Which will happen anyways. They should never have started lying, but these people have now done the same crap for maybe 10 AI hypes now. I think part of them are simply scammers and part are not capable of learning.
this means absolutely nothing (Score:5, Insightful)

by v1 ( 525388 ) writes: on Friday January 23, 2026 @12:29PM (#65944318) Homepage Journal

Anyone familiar with AI is painfully aware that AI models are ONLY good at what they're trained to do. If you train it to pass your interview, then of course it's going to be very good at that. But as soon as you take a step or two away from what it's been training on, it will be anywhere from bad to horrible. And what's worse, they often have an absurdly high level of confidence in their wrong answers when you go off training.

- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  Either they are really stupid in their benchmarks, or they did not train on the test set.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Obviously. But it makes for a nice lie-by-misdirection headline for the clueless.
- Hush now... (Score:2)
  
  by ebunga ( 95613 ) writes:
  
  Just buy some AI. They spent a trillion dollars on the crap. Come on bro, think of the economy if they don't make it. It will be your fault that the trillionaire class doesn't get another trillion dollars while the rest of the non-AI economy continues as if nothing happened.
- Re: (Score:2)
  
  by bussdriver ( 620565 ) writes:
  
  Over a decade ago, IBM trained an AI to master Jeopardy ...which really was studying the writers who create the questions. That is far more broad and difficult than any similar tests they create for a job interview. But it's kind of similar to job interviews...(more broad trivia; less reasoning but then brute force trivia can pose as job-interview level reasoning...think about it.)
  The value of the human is going to be figuring out new things before they build up a massive data set they can approximate a cl
man vs. car (Score:5, Informative)

by hdyoung ( 5182939 ) writes: on Friday January 23, 2026 @12:30PM (#65944324)

Please, demonstrate your hireability by running 10 miles as fast as you can. What, you just got beaten out by a rusty 1998 ford fiesta? What's wrong with you? Clearly, you're not right to work at our firm

All this means is the Anthropic is in a near-hiring-freeze situation

. https://www.youtube.com/watch?... [youtube.com]

Sounds like they don't understand the point (Score:5, Insightful)

by shess ( 31691 ) writes: on Friday January 23, 2026 @12:54PM (#65944380) Homepage

The point of hiring is not to hire people who know the answers to riddles under a time limit. The point is to hire people who can get up to speed on the job reasonably quickly, work well in concert with their co-workers, and then grow the position and product and company going forward. Honestly, the best candidates for most tech jobs won't be bothered to optimize for your particular interview - I'm not saying they'll outright fail, but rather they have many opportunities, so for them the interview is a mutual affair. They are also interviewing you, and if you interview them in your asshat persona, they will likely just move on.

- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  They are also interviewing you, and if you interview them in your asshat persona, they will likely just move on.
  Exactly. Especially the really good ones will have lots of other options. You will not get any of those with such a hiring process.
- Re: (Score:2)
  
  by Tony Isaac ( 1301187 ) writes:
  
  Yes, this. I have interviewed candidates who were using AI to supply answers to my questions. It wasn't hard to spot them. Their answers were too precise, to thorough, too complete.
  My style is to ask lots of why questions. If somebody says they know about ORMs, I ask them which one they would choose, and why. AI is too un-opinionated, providing all the caveats and reasons to consider any of the options based on your criteria. I look for opinions that can be backed up with reasons when I probe further. Oh, a
It's not just Anthropic dealing with this (Score:3)

by timholman ( 71886 ) writes: on Friday January 23, 2026 @01:03PM (#65944420)

The latest version of Gemini is causing quite a stir in our engineering school. We've been testing it by snapping images of problems from exams on cell phones and asking Gemini to solve them. The worst exam performance we've seen so far is an overall grade of B-, and that includes graduate-level exams. Gemini even helpfully provides all of its intermediate work to show how it got the solution.
Gemini can even (correctly) find a solution for many simple design problems. We're scrambling to adapt to what's happening, and wondering what we'll be dealing with a year (or two) from now.
We had hoped that engineering (as opposed to computer science) would be relatively immune to AI disruption for at least a few more years, but we were very much mistaken. If all one evaluated was homeworks and exams, the best AIs are already capable of earning an engineering degree in most disciplines.

- Re:It's not just Anthropic dealing with this (Score:4, Insightful)
  
  by gweihir ( 88907 ) writes: on Friday January 23, 2026 @01:17PM (#65944494)
  
  You need not fear. Regularly, AI gets it wrong so badly that no actual engineer would ever have made that mistake. Do not look at the cases where AI works. Look at the ones where it fails.
  
- Re: (Score:1)
  
  by Iamthecheese ( 1264298 ) writes:
  
  If near future Gemini is undeniably a superhuman coder and never writes a bug it will still be only partway to replacing a programmer. The rest of the job is coping with uncertainty, bad specs, knowing what questions to ask, remembering enough about the broader environment, systems analysis and so on.
- Re: (Score:1)
  
  by Zuck Enabler ( 10503068 ) writes:
  
  Microsoft mathematics has been able to solve most problems for 15 years and it didn't revolutionize the world then.
So where's the results? (Score:4, Insightful)

by Somervillain ( 4719341 ) writes: on Friday January 23, 2026 @01:52PM (#65944598)

You can tell me your system is smarter than your own engineers with your "trust me, bro" interview exam. But take a step back. You say you have a magic product that can code as well as your best engineers with some clever prompting? That's really fucking magical, dude! That would transform the world if software can be written with a prompt. These products have been out for almost 5 years. There should be a revolution based on startups writing some AMAZING code. If not startups, why not hobbyists wowing us by resurrecting old games that have been open sourced? Weren't quake and unreal tournament open sourced? Why aren't we seeing articles about people having Claude port them to UE5 and creating new and amazing games? Would you even need the source code? Why isn't the world filled with a bunch of people writing games by prompting Claude?..."hey, write unreal tournament, but with modern graphics and set it in Chicago, Seattle, and London?" I've been using Claude for a few years. I can say from my personal experience that sometimes it works. It worked for me 3x this week with very simple tasks. It failed 2 times to generate code that compiles...and the code it generated that compiled was pretty shitty low-end Java that is obsolete as of 15 years ago, but it did work.

The Best of the Best. (Score:2)

by geekmux ( 1040042 ) writes:

Hume redesigned the test, making it harder. Then Claude Opus 4.5 matched even the best human scores within the two-hour time limit...Beat Claude's best score and ... they want to hear from you.
Hear from me for what? To teach Claude even faster than that how to do the very job you (allegedly still) need me for? The irony of looking for the best of the best at taking their own jobs, is not lost. Gee Chuck, the date was going so well..until self-realization code kicked in that summarized I will never be needed ever again.
And executive greed wonders why their human resources feel more used than the last condom at a Diddy after-after party.
And again (Score:2)

by RitchCraft ( 6454710 ) writes:

Once again, my bullshit meter is pegged.
OK it can pass the interview, but... (Score:2)

by smithmc ( 451373 ) writes:

...can it do the job? Have they tried giving it real work assignments of the sort they would actually give to human engineers, and see how it does?
What a job opportunity... (Score:2)

by ndykman ( 659315 ) writes:

Nothing like being a junior developer at a company whose business relies on getting rid of roles *exactly* like yours.
Then again, there's completely useless stock options that you can pretend will pay off big time.
OUR PRODUCT IS TOO GOOD (Score:1)

by Zuck Enabler ( 10503068 ) writes:

Oh! Our dang product is just tooooo good
Declared the marketing department!

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Amazing (Score:5, Funny)

Re:Amazing (Score:5, Insightful)

Re: Amazing (Score:2)

Re: (Score:2)

Re: Amazing (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Amazing (Score:3)

Re: (Score:2)

Re: Amazing (Score:2)

That's not really a surprise (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Knuth has a real chance at this one! (Score:1)

Re: (Score:2)

Re: (Score:2)

this means absolutely nothing (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Hush now... (Score:2)

Re: (Score:2)

man vs. car (Score:5, Informative)

Sounds like they don't understand the point (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

It's not just Anthropic dealing with this (Score:3)

Re:It's not just Anthropic dealing with this (Score:4, Insightful)

Re: (Score:1)

Re: (Score:1)

So where's the results? (Score:4, Insightful)

The Best of the Best. (Score:2)

And again (Score:2)

OK it can pass the interview, but... (Score:2)

What a job opportunity... (Score:2)

OUR PRODUCT IS TOO GOOD (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals