Forgot your password?
typodupeerror
AI Programming

Sixteen AI Agents Built a C Compiler From Scratch (arstechnica.com) 162

Anthropic researcher Nicholas Carlini set 16 instances of Claude Opus 4.6 loose on a shared codebase over two weeks to build a C compiler from scratch, and the AI agents produced a 100,000-line Rust-based compiler capable of building a bootable Linux 6.9 kernel on x86, ARM and RISC-V architectures.

The project ran through nearly 2,000 Claude Code sessions and cost about $20,000 in API fees. Each instance operated inside its own Docker container, independently claiming tasks via lock files and pushing completed code to a shared Git repository. No orchestration agent directed traffic. The compiler achieved a 99% pass rate on the GCC torture test suite and can compile major open source projects including PostgreSQL, SQLite, Redis, FFmpeg and Doom. But it lacks a 16-bit x86 backend and calls out to GCC for that step, its assembler and linker remain buggy, and it produces less efficient code than GCC running with all optimizations disabled.

Carlini also invested significant effort building test harnesses and feedback systems to keep the agents productive, and the model hit a practical ceiling at around 100,000 lines as bug fixes and new features frequently broke existing functionality.
This discussion has been archived. No new comments can be posted.

Sixteen AI Agents Built a C Compiler From Scratch

Comments Filter:
  • Wouldn't it give better results if just one did it?

    • by Sique ( 173459 )
      Because it would have taken 16 times the time, 32 weeks instead of 2 weeks.
    • First of all, why would it give better results? Agents do not have the capacity to understand how one part of the code interacts with another.

      Also, I'm guessing it would have taken 32 weeks with one single agent. Probably not interesting to anyone willing to foot the bill.

    • by Gilgaron ( 575091 ) on Monday February 09, 2026 @04:28PM (#65978460)
      The longer they run the more they tend to go off the rails (kinda like going rampant in sci fi i guess?) so the latest stuff has involved using agents to split out tasks to other agents and QC the results and strap everything together. But that's not too surprising since it is how humans get complicated things done, too, versus getting one guy to do the whole damn thing at once.
      • So a bit like a large team of developers. We go off the rails too without supervision. Kind of why you need to collect status updates and maintain a schedule and project plan and architectural documents.

        Successful open source projects work because the ones who succeed have the discipline to do the coordination and supervision in a more cooperative and distributed way. Top-down programming projects are a hot mess, like if you do Waterfall management without actually writing a plan and sticking to it.
        Agile ta

    • by Junta ( 36770 )

      These models aren't parallel in their nature, so spinning up multiple 'agents' is just how you manage concurrency. Just like forking a process. I think it's a bit werd to call each instance a distinct 'agent', it is trying too hard to humanize this very synthetic thing...

    • by allo ( 1728082 )

      Different agents have different task descriptions. Think for example of a planner, a coder, a reviewer, a critic and a debugger, which alternate and can request each other to take over.

    • The longer an agent works on its own, the more likely it is to suffer from its own hallucinations. Using different agents likely helps mitigate this by not allowing a single agent to go off on its own for too long

    • > Wouldn't it give better results if just one did it?

      ClippyAI: Pair programming is an agile software development technique where two developers work together at one workstation, with one acting as the "driver" (writing code) and the other as the "navigator"
  • Congratulations (Score:4, Insightful)

    by Anonymous Coward on Monday February 09, 2026 @04:15PM (#65978418)

    Congratulations on reimplementing the wheel with a new set of bugs and security issues, useless tools, and no support, all while wasting time, money, and energy.

    • by SumDog ( 466607 )
      This is why we can't have affordable RAM.
    • Re:Congratulations (Score:4, Interesting)

      by OrangeTide ( 124937 ) on Monday February 09, 2026 @04:46PM (#65978510) Homepage Journal

      It's a neat demonstration. I think if I had $20k to throw around, I'd do the opposite. Make a new 16-bit C compiler to support UZI or Fuzix [fuzix.org]. (maybe not, might be more fun to write myself)

      I've seen some decent results for retro programming, such as this vid Let's Create a Commodore C64 BASIC Interpreter in VSCode! [youtube.com]. Where the presenter gets a Commodore/Microsoft BASIC in C, and not only that with some hand holding gets it to output something capable of working on 2.11BSD for PDP-11.

      • by DamnOregonian ( 963763 ) on Monday February 09, 2026 @05:15PM (#65978574)
        As an owner of the Dragon Book, and who wrote a mostly-C compiler for my PDP-11/35 (BSD?! hah!)
        Writing a C compiler isn't what I would call fun. Debugging generated code and then debugging why you generated that broken generated code is a whole different circle of hell.
        • It was pretty fun writing a Pascal for the pcode machine back in the day. And fun to make a couple of pcode to assembler translators, which shows the rather tricky problem of optimizing register allocation in a stack machine, but fun.

          Porting and updating existing compilers like PCC and the Plan9 C compilers is also not too bad, rather enjoyable really. Because much of the hair pulling and crying is over. But these days I just patch LLVM, which is so huge and complex that I'm unlikely to understand more than

          • It was pretty fun writing a Pascal for the pcode machine back in the day. And fun to make a couple of pcode to assembler translators, which shows the rather tricky problem of optimizing register allocation in a stack machine, but fun.

            You're cut from a different cloth than me, that's for sure.

            Porting and updating existing compilers like PCC and the Plan9 C compilers is also not too bad, rather enjoyable really. Because much of the hair pulling and crying is over. But these days I just patch LLVM, which is so huge and complex that I'm unlikely to understand more than the tiny little piece I need for my job.

            PCC is a fantastic learning tool for anyone interested in writing a compiler. I wish it had been available to me at the time.
            LLVM.... I don't even want to think about.

            It's been a decade and a half since I had to deal with anything on the compiler side of things. These days, I'm just happy if the always-in-flux kernel API doesn't break a module I have to maintain.

          • Every write anything for the PERQ?

            • Never even seen one in person. A bit before my time. Closed this to that I have used is UCSD Pascal at a neighbor's house a bit one summer before buying Turbo Pascal.

    • by 0123456 ( 636235 )

      Don't forget that a huge amount of work for a competent compiler is performance enhancements for different CPUs. I'm guessing it probably doesn't have those.

    • Now work on a Rust compiler
    • by allo ( 1728082 )

      LLM got interesting in the last 3 years. Now they build working C compilers. Even if the compiler would inject a bug in every binary it would still be impressive and you expect perfection from a technology that is still in its infancy. And if the Linux kernel boots, the compiler cannot be that bad either ... so what do you think will be possible in let's say ten years?

      Also humans still exist and you can and probably should have one who reads the code produced by the AI. It's not like these tools exist in a

    • I think it's interesting. C compilers aren't that complicated but C is just enough "uncooperative" that neither is it "feed the right file into lex/yacc and it will work". 20K seems cheap.

      What I'd really like to see though is what they can do with reverse engineering, binary to C seems eminently sensible, C is low enough level that it can easily reflect the binary code without artificial constructs, but it's also expressive enough that it can express the semantic meaning of the code. And unlike compiling, w

  • And all he needed were a couple of pizzas.

  • by crgrace ( 220738 ) on Monday February 09, 2026 @04:30PM (#65978466)

    From TFA:

    When all 16 agents got stuck trying to fix the same Linux kernel bug simultaneously, he used GCC as a reference oracle, randomly compiling most kernel files with GCC and only a subset with Claudeâ(TM)s compiler, so each agent could work on different bugs in different files.

    Not only was Claude trained on a lot of different C compilers, including the entirety of GCC, it still needed a golden model in order to finish.

    If I claimed I had written my own C compiler from scratch and then this came out in an interview, I don't think I would be hired.

    • by 93 Escort Wagon ( 326346 ) on Monday February 09, 2026 @04:43PM (#65978496)

      Yeah, perhaps an extra step for the analysis would be to run comm against this "from scratch" compiler versus gcc and clang.

    • by Waffle Iron ( 339739 ) on Monday February 09, 2026 @05:18PM (#65978578)

      Not only was Claude trained on a lot of different C compilers, including the entirety of GCC, it still needed a golden model in order to finish.

      To be fair, they were going use the ANSI C23 specification as a basis, but nobody wanted to shell out the money to buy an official copy.

    • In fact the agents simply copied fragments of C compilers from the internet until they ended up with something that (barely) works. I would consider it noteworthy if they only had the specification of what the compiler would have to do and no ready-made compilers to use as examples.
      • by gweihir ( 88907 )

        Indeed. The thing that is utterly telling is that it could not hack the 16 bit part. There is really enough documentation on the web on 16 bit x86 code that when you got a 32/64 bit compiler going, adding a target for 16 bit should be very, very easy. Well, for a human, it would be. Not for a mindless automaton.

    • by gweihir ( 88907 )

      Not only was Claude trained on a lot of different C compilers, including the entirety of GCC, it still needed a golden model in order to finish.

      If I claimed I had written my own C compiler from scratch and then this came out in an interview, I don't think I would be hired.

      Indeed. That is called "faking it". But a ton of mindless believers will be deeply impressed by this meaningless stunt nonetheless.

  • Ok? (Score:3, Insightful)

    by Anonymous Coward on Monday February 09, 2026 @04:35PM (#65978480)

    I mean, this is impressive in the same way that turning vinyl gloves into hot sauce is impressive. Fabrice Bellard wrote TCC in a few months and it was under 30k lines of code, including an assembler and linker, and it actually worked (the Claude c compiler apparently cannot compile Hello World, so it undoubtedly has hundreds of bugs) - I am a little terrified of what that 100k lines of code actually looks like. If my past experience with coding AI is any guide, it will be disorganized, needlessly wordy, not standards-compliant, has weird unnecessary bugs around Leap Year, lacking in useful comments etc.

    Which is to say while this is impressive, I am far more worried about what this means for the future of software quality than I am impressed with this achievement. Look at the crazy number of bugs Windows has had recently, coincidentally after MSFT bragged about how much code was being written by AI these days.

  • Yeah nah (Score:4, Insightful)

    by Willeh ( 768540 ) <rwillemstein@proton.me> on Monday February 09, 2026 @04:36PM (#65978484)
    Plagiarized*
  • by Valgrus Thunderaxe ( 8769977 ) on Monday February 09, 2026 @04:52PM (#65978526)
    Please tell me why I shouldn't just commit suicide, right now?
  • by ameline ( 771895 ) <ian,ameline&gmail,com> on Monday February 09, 2026 @05:32PM (#65978610) Homepage Journal

    Apparently the performance of the code it produces is worse than gcc with all optimizations turned off. That's pathetic. And it is written in rust, so it can't bootstrap - And the size of it is on par with a hand-written C compiler - so no win there.

    When I worked on compilers at IBM, our first bootstrap of C for x86 was a cause for celebration. A double bootstrap was a great "smoke test". But that was relatively easy. Passing the validation test suites were *way* harder - like 50x harder.

    • by Pascoea ( 968200 )
      Being fast, being able to bootstrap, and a size limitation weren't the goals. The goal was "can it do it", which it appears the answer is "yeah, kinda". Vanishingly little of my hobby work output would pass muster against something professionally produced, but that's not the point. I just wanted to do it myself. That seems to mirror this project. I assume they picked GCC because they wanted a sufficiently complicated project that had a well-defined borders. The experiment could have picked anything, li
      • I was surprised they hadn't already done this. It seemed like an obvious target given the large test suite available. The result is interested in that it helps show the possibilities and limitations, the contours, of AI coding.

        Now it's obvious why no one did it: $20,000 of compute time.
        • by Pascoea ( 968200 )

          The result is interested in that it helps show the possibilities and limitations, the contours, of AI coding.

          That is absolutely the "why", right there. I'm still on the fence on if AI is coming for developers' jobs or not, but this experiment definitely nudges me more toward the "yeah, probably" side. If I made my money slinging code I'd be making sure I knew how to do it using these tools.

          Now it's obvious why no one did it: $20,000 of compute time.

          Peanuts, in the corporate world.

    • so no win there.

      I mean they did it in a short time for $20k. Performance here doesn't just mean execution speed. You're comparing it against a 39 year old project that has millions of hours contributed to it. The point of this wasn't to create a high performance GCC alternative, it was to demonstrate functionality and speed.

      I doubt AI coding assistants will ever produce something as performant as experts who have been optimising shit for over a decade do, but some people don't have a decade to spare.

    • Re: (Score:3, Insightful)

      by allo ( 1728082 )

      "That's pathetic."
      Don't you think an AI building a working compiler at all is quite impressive?

      When it comes to AI, it seems that people accept nothing under perfection instead of thinking about how impressive it is that a computer can now do such tasks. Most humans I know can't code a C compiler on their own, no matter how unoptimized the results are allowed to be.

      • Don't you think an AI building a working compiler at all is quite impressive?

        And this is why I hate AI.

        AI building a kinda-sorta working compiler, albeit one with a lot of jankyness and also incomplete and leaning on numerous tools to make it work, like er gcc, is impressive.

        Being oversold is what pisses me and most people here off.

  • Fun quote.. (Score:2, Insightful)

    by Junta ( 36770 )

    . But that total is a fraction of what it would cost me to produce this myself—let alone an entire team.

    I'm willing to commit to provide a C compiler in a single day for a tenth of the cost:

    # dnf install gcc

    For a bonus, I'll even do two:
    # dnf install clang

    Don't know what I'll do with the other 7.99 hours of the day though...

    This reminds me of how Khaby Lame mocked all those overcomplicated "life hacks" by doing the obvious simple things.

  • Big Deal (Score:3, Informative)

    by pngwen ( 72492 ) on Monday February 09, 2026 @06:19PM (#65978710) Journal

    It took just one Dennis Ritchie to do that back in the 70s, and he used much less water in the process!

    • by Jeremi ( 14640 )

      It took just one Dennis Ritchie to do that back in the 70s, and he used much less water in the process!

      Well, now you've done it -- we have to estimate how water Dennis Ritchie used in the process of writing his C compiler.

      - According the Wikipedia, Dennis Ritchie developed the first C compiler in 1972 and 1973, so I'm going to call it two years.

      - The average American uses 80-100 gallons of water per day (including drinking water, toilet flushing, showers, cooking, laundry, etc).

      So that puts the water usage for Ritchie's C compiler in the ballpark of 65,700 gallons total. Dunno how that compares to the AIs'

      • - According the Wikipedia, Dennis Ritchie developed the first C compiler in 1972 and 1973, so I'm going to call it two years.

        Not only is that equally likely to be just over one year from the description alone, but you've also assumed he did nothing else over the same period...

  • by phantomfive ( 622387 ) on Monday February 09, 2026 @06:53PM (#65978796) Journal
    The source code is available online if you want to look at it: https://github.com/anthropics/... [github.com]
  • It surprises me that many in the Slashdot community are inclined to focus on the aspects that this demonstration didn't go well.

    Surely, if you described this story (warts and all) to a group of Computer Scientists in the 1980s, predicting it would be a capability of AI in 2026, they would look back at their expert systems and decision trees, and be pretty skeptical.

    • I'm not so sure. A decision tree could do this, it would just need to traverse the syntax and make a decision at each token. I think what has changed is that the decision tree would not have been seen as impressive because we knew how they worked. A lot of people seem to think LLMs are magic so that makes it more impressive to them.

  • That is one of the reasons C is still around: Somebody competent can build a compiler for any target pretty easily. There is a lot of materials on how to do it on the Web and in books as well, to the "AI" did not need to understand or "invent" anything.

    There is really nothing to see here. Just another meaningless stunt that may impress the clueless.

  • Or do we need an Amazon p5.48xlarge instance to compile the kernel?
  • That I don't have to maintain it, or use it...
  • So this is more AI party tricks. A person can make a coin disappear through misdirection. The misdirection here is that a c compiler seems complex but there's actuallly an exact blueprint for every compiler by just following the syntax.

  • What's it like being a coder working for an AI company marketing replacing labour? Making something to replace your peers is far from the same as shooting them but it's a step in the same direction. And I've heard all the justification crap about sewing machines and tractors. But AI coders, rather than keeping the money, they are working towards making themselves redundant or less valued and giving their money to weirdos like Altman and Musk. What kind of self sabotage does that ?

    Judging from the salaries,

  • If by that you mean using every compiler ever made as training material, sure, I guess?

Everybody likes a kidder, but nobody lends him money. -- Arthur Miller

Working...