Forgot your password?
typodupeerror
AI

Anthropic's AI Keeps Passing Its Own Company's Job Interview (anthropic.com) 39

Anthropic has a problem that most companies would envy: its AI model keeps getting so good, the company wrote in a blog post, that it passes the company's own hiring test for performance engineers. The test, designed in late 2023 by optimization lead Tristan Hume, asks candidates to speed up code running on a simulated computer chip. Over 1,000 people have taken it, and dozens now work at Anthropic. But Claude Opus 4 outperformed most human applicants.

Hume redesigned the test, making it harder. Then Claude Opus 4.5 matched even the best human scores within the two-hour time limit. For his third attempt, Hume abandoned realistic problems entirely and switched to abstract puzzles using a strange, minimal programming language -- something weird enough that Claude struggles with it. Anthropic is now releasing the original test as an open challenge. Beat Claude's best score and ... they want to hear from you.
This discussion has been archived. No new comments can be posted.

Anthropic's AI Keeps Passing Its Own Company's Job Interview

Comments Filter:
  • Amazing (Score:5, Funny)

    by Inglix the Mad ( 576601 ) on Friday January 23, 2026 @12:10PM (#65944240)
    Their in house system is good at solving problems they write! INCONCEIVABLE!
    • Re:Amazing (Score:5, Insightful)

      by Brain-Fu ( 1274756 ) on Friday January 23, 2026 @12:12PM (#65944250) Homepage Journal

      We are supposed to read this and think "wow, if it can do that, then surely it can replace several of these expensive programmers that I have on my staff!"

      And yet....it can't....

      • That's a weird thing to say when the entire article is about how they needed to keep improving the test to better help them hire software engineers.

  • by rsilvergun ( 571051 ) on Friday January 23, 2026 @12:11PM (#65944242)
    The AI is going to quickly figure out the words needed to say to get through a job interview in the same way that a simple machine learning algorithm can learn how to beat world 1 1 of super Mario Bros. It tries the interview and fails and then it tries something else and fails and it does that a whole bunch of times until it succeeds and then it figures out what the interviewer is looking for.

    This is more a sign of a weak interview process than anything else.

    Right now the main thing these AI llms do is replace low-level customer service jobs. The problem with that is they are replacing a lot of those jobs and those people don't just eat a bullet when they become unemployable.

    Some of them get stuck in fast food or go be plumbers or whatever but a lot of them go back and get starts or study or get degrees or take the degrees they already have and push themselves harder and start competing for your higher paying job.

    There are millions of fake job posts out there that exist only to gauge the state of the job market so your boss knows whether or not he can cut your pay or fire you and replace you with somebody younger. Your boss doesn't care about your qualifications or your knowledge he doesn't even bother taking the time to learn those things. You are a number on a spreadsheet
    • by allo ( 1728082 )

      In theory you're right, but in practice I doubt that they gave the AI a thousand tries. And your optimization approach would need a few orders of magnitude more. If you have an AI that can do self play it will find all kinds of exploits as shortcuts, but it also needs like a million tries for that.

      • by gweihir ( 88907 )

        but in practice I doubt that they gave the AI a thousand tries.

        Why? Gaming benchmarks and lying abut performance is pervasive in the LLM industry.

      • They don't have to give it a thousand tries. You just feed it the results of successful interviews and the machine learning does the rest. I used world 1 1 is an example because it's probably one of the most famous training examples out there but you don't need to necessarily let the AI play the game. It can learn on after the fact training data
  • Reading through some of Don Knuth's stuff, I'd say he has a pretty good chance of beating the pants of Claude in this one.

    • by HiThere ( 15173 )

      Not this year. He's nearly 90, and short term memory declines as you approach that old.

  • by v1 ( 525388 ) on Friday January 23, 2026 @12:29PM (#65944318) Homepage Journal

    Anyone familiar with AI is painfully aware that AI models are ONLY good at what they're trained to do. If you train it to pass your interview, then of course it's going to be very good at that. But as soon as you take a step or two away from what it's been training on, it will be anywhere from bad to horrible. And what's worse, they often have an absurdly high level of confidence in their wrong answers when you go off training.

    • by allo ( 1728082 )

      Either they are really stupid in their benchmarks, or they did not train on the test set.

    • by gweihir ( 88907 )

      Obviously. But it makes for a nice lie-by-misdirection headline for the clueless.

    • Just buy some AI. They spent a trillion dollars on the crap. Come on bro, think of the economy if they don't make it. It will be your fault that the trillionaire class doesn't get another trillion dollars while the rest of the non-AI economy continues as if nothing happened.

    • Over a decade ago, IBM trained an AI to master Jeopardy ...which really was studying the writers who create the questions. That is far more broad and difficult than any similar tests they create for a job interview. But it's kind of similar to job interviews...(more broad trivia; less reasoning but then brute force trivia can pose as job-interview level reasoning...think about it.)

      The value of the human is going to be figuring out new things before they build up a massive data set they can approximate a cl

  • man vs. car (Score:5, Informative)

    by hdyoung ( 5182939 ) on Friday January 23, 2026 @12:30PM (#65944324)
    Please, demonstrate your hireability by running 10 miles as fast as you can. What, you just got beaten out by a rusty 1998 ford fiesta? What's wrong with you? Clearly, you're not right to work at our firm

    All this means is the Anthropic is in a near-hiring-freeze situation

    . https://www.youtube.com/watch?... [youtube.com]
  • by shess ( 31691 ) on Friday January 23, 2026 @12:54PM (#65944380) Homepage

    The point of hiring is not to hire people who know the answers to riddles under a time limit. The point is to hire people who can get up to speed on the job reasonably quickly, work well in concert with their co-workers, and then grow the position and product and company going forward. Honestly, the best candidates for most tech jobs won't be bothered to optimize for your particular interview - I'm not saying they'll outright fail, but rather they have many opportunities, so for them the interview is a mutual affair. They are also interviewing you, and if you interview them in your asshat persona, they will likely just move on.

    • by gweihir ( 88907 )

      They are also interviewing you, and if you interview them in your asshat persona, they will likely just move on.

      Exactly. Especially the really good ones will have lots of other options. You will not get any of those with such a hiring process.

    • Yes, this. I have interviewed candidates who were using AI to supply answers to my questions. It wasn't hard to spot them. Their answers were too precise, to thorough, too complete.

      My style is to ask lots of why questions. If somebody says they know about ORMs, I ask them which one they would choose, and why. AI is too un-opinionated, providing all the caveats and reasons to consider any of the options based on your criteria. I look for opinions that can be backed up with reasons when I probe further. Oh, a

  • by timholman ( 71886 ) on Friday January 23, 2026 @01:03PM (#65944420)

    The latest version of Gemini is causing quite a stir in our engineering school. We've been testing it by snapping images of problems from exams on cell phones and asking Gemini to solve them. The worst exam performance we've seen so far is an overall grade of B-, and that includes graduate-level exams. Gemini even helpfully provides all of its intermediate work to show how it got the solution.

    Gemini can even (correctly) find a solution for many simple design problems. We're scrambling to adapt to what's happening, and wondering what we'll be dealing with a year (or two) from now.

    We had hoped that engineering (as opposed to computer science) would be relatively immune to AI disruption for at least a few more years, but we were very much mistaken. If all one evaluated was homeworks and exams, the best AIs are already capable of earning an engineering degree in most disciplines.

    • by gweihir ( 88907 ) on Friday January 23, 2026 @01:17PM (#65944494)

      You need not fear. Regularly, AI gets it wrong so badly that no actual engineer would ever have made that mistake. Do not look at the cases where AI works. Look at the ones where it fails.

    • If near future Gemini is undeniably a superhuman coder and never writes a bug it will still be only partway to replacing a programmer. The rest of the job is coping with uncertainty, bad specs, knowing what questions to ask, remembering enough about the broader environment, systems analysis and so on.
    • Microsoft mathematics has been able to solve most problems for 15 years and it didn't revolutionize the world then.

  • by Somervillain ( 4719341 ) on Friday January 23, 2026 @01:52PM (#65944598)
    You can tell me your system is smarter than your own engineers with your "trust me, bro" interview exam. But take a step back. You say you have a magic product that can code as well as your best engineers with some clever prompting? That's really fucking magical, dude! That would transform the world if software can be written with a prompt. These products have been out for almost 5 years. There should be a revolution based on startups writing some AMAZING code. If not startups, why not hobbyists wowing us by resurrecting old games that have been open sourced? Weren't quake and unreal tournament open sourced? Why aren't we seeing articles about people having Claude port them to UE5 and creating new and amazing games? Would you even need the source code? Why isn't the world filled with a bunch of people writing games by prompting Claude?..."hey, write unreal tournament, but with modern graphics and set it in Chicago, Seattle, and London?" I've been using Claude for a few years. I can say from my personal experience that sometimes it works. It worked for me 3x this week with very simple tasks. It failed 2 times to generate code that compiles...and the code it generated that compiled was pretty shitty low-end Java that is obsolete as of 15 years ago, but it did work.
  • Hume redesigned the test, making it harder. Then Claude Opus 4.5 matched even the best human scores within the two-hour time limit...Beat Claude's best score and ... they want to hear from you.

    Hear from me for what? To teach Claude even faster than that how to do the very job you (allegedly still) need me for? The irony of looking for the best of the best at taking their own jobs, is not lost. Gee Chuck, the date was going so well..until self-realization code kicked in that summarized I will never be needed ever again.

    And executive greed wonders why their human resources feel more used than the last condom at a Diddy after-after party.

  • Once again, my bullshit meter is pegged.

  • ...can it do the job? Have they tried giving it real work assignments of the sort they would actually give to human engineers, and see how it does?
  • Nothing like being a junior developer at a company whose business relies on getting rid of roles *exactly* like yours.

    Then again, there's completely useless stock options that you can pretend will pay off big time.

  • Oh! Our dang product is just tooooo good

    Declared the marketing department!

The finest eloquence is that which gets things done.

Working...