Forget Prompt Engineering: 'Loop Engineering' Is All the Rage Now (businessinsider.com) 90
An anonymous reader quotes a report from Business Insider: For the most powerful voices in AI, it's all about being in the loop. Claude Code creator Boris Cherny recently said he doesn't write his own AI prompts much anymore. Thanks to loops, he doesn't have to. "It's an agent that prompts Claude," Cherny recently told CNBC, adding, "I don't write the prompt anymore. Claude writes the prompt, and now I'm talking to that new Claude that is kind of coordinating." In the same interview, Cherny said that loops and a similar feature were examples of the kind of work he would be proudest of in a decade.
Cherny isn't the only one embracing "loop engineering." OpenAI engineer Peter Steinberger, the creator of the viral OpenClaw project, wrote a public reminder to users who are still writing out prompts for AI agents. "Here's your monthly reminder that you shouldn't be prompting coding agents anymore," Steinberger wrote recently on X. "You should be designing loops that prompt your agents." [...] Steinberger shared an example of a loop he uses: "Tell codex to maintain your repos, wake up every 5 minutes and direct work to threads. That makes it easy to parallelize+steer work as needed." Claire Vo, founder of ChatPRD and host of the "How I AI," said, "it's really just reminding people that you don't have to use your human fingers to type in a prompt in order for your agent to do work on your behalf."
The days of directly prompting generative AI coding tools are "kind of over, or at least some think it's going to be," Addy Osmani, director of Google Cloud, wrote in his post explaining the concept.
Cherny isn't the only one embracing "loop engineering." OpenAI engineer Peter Steinberger, the creator of the viral OpenClaw project, wrote a public reminder to users who are still writing out prompts for AI agents. "Here's your monthly reminder that you shouldn't be prompting coding agents anymore," Steinberger wrote recently on X. "You should be designing loops that prompt your agents." [...] Steinberger shared an example of a loop he uses: "Tell codex to maintain your repos, wake up every 5 minutes and direct work to threads. That makes it easy to parallelize+steer work as needed." Claire Vo, founder of ChatPRD and host of the "How I AI," said, "it's really just reminding people that you don't have to use your human fingers to type in a prompt in order for your agent to do work on your behalf."
The days of directly prompting generative AI coding tools are "kind of over, or at least some think it's going to be," Addy Osmani, director of Google Cloud, wrote in his post explaining the concept.
To quote the Bobs (Score:5, Funny)
Re:To quote the Bobs (Score:4, Insightful)
Re: (Score:2)
Re: (Score:1)
Re:To quote the Bobs (Score:4, Funny)
So what exactly would you say you *do* here?
My choices of name tag are narrowing down these days.
Depending on the mood, it's either Skynet Fodder, or Soylent Green. At your service.
Re: (Score:2)
In an already-funny thread you made me laugh the hardest, and managed to throw in a wince as well. Thanks, I think... ;-)
Where did the "AI replaced X number of workers" go (Score:2)
How may "look it's a new shiny goalpost move" will we have with the AI hype train?
Is this invent a new buzz word of the year so that people forget about the buzz words / technology of last year and the year before not delivering the productivity and employee reduction layoffs promised?
And exactly when do we get back to the "How much cost does it take to produce detailed enough specifications for the software solution?" age-old question?
Agile was supposed to fix that by going from design and specify up front
Questions (Score:4, Insightful)
Re: (Score:1, Troll)
Re:Questions (Score:4, Insightful)
Uh dude, context windows are absolutely a thing.
Re: Questions (Score:5, Informative)
Yah, that was the GP's problem though, the parent is right, if you let one session get too long the context goes off the rails. You're better off curating some .md files, like a README, skill files, etc and being able to start a new session for each task. You could end up paying more tokens for repetitive discovery in each task but you can improve that various ways, like indexing your code, knowledge graphs etc, same stuff YOU would do to make searching a large alien code base more efficient.
That's all best practice _already_, and it's obviously required to drive the process with some control loop too, but you guys acting like this means appending to a single context continuously are so lost. This "loop engineering" (no judgement) is just an automated control loop on top of an already reentrant process.
Re: (Score:2)
Someone once said that 640KB was enough for everyone.
The context window issue will be resolved in the future with stacked 4D memory chips and the all new 50 picometer Intel microprocessors.
Re: Questions (Score:2)
Re: (Score:2)
So is context compaction and using a "state machine" for the AI to be able to save data across context purge / loads.
For example, I fired up an LLM on my M4 Pro Macbook, gave it an MCP connection to Notion, where I shared with it pages that had feature definitions, and other relevant information about the product and various terms and nomenclature. I created a database in Notion with status and dependency columns, and then told an agent to start decomposing a feature into actionable "story" tickets with a
Re: Questions (Score:3)
Wow.
Re: (Score:2)
The "thing" is that this is useful for repetitive or similar work. With a well written context, skill and agents, you can simply ask the AI to start a new project and it will already know what to do and how do you want it to do it.
Re: Questions (Score:2)
No, it really wont. Anything that simple can be automated using standard scripts. Anything more complicated and its guesswork on the AIs part.
Re: Questions (Score:3)
Re: Questions (Score:2)
It's like giving notes to a new coworker, that actually reads your notes. It's a best practice for using a LLM because you shouldn't keep all the important bits in context... your chat history. That gets compacted when you get close to the context window limit. It's more efficient to start a new one for each task, depending on complexity. You write the important parts that can't be inferred well in basically README files, each new session reads those to get up to speed.
You'd have different workflows for a s
Re: (Score:2)
In all this vibe coding/agentic development/loop coding, HOW DO YOU KNOW the AI is doing something fundamentally wrong?
Re: (Score:2)
It never does something wrong, until it does, at which point it was somehow your fault all along. I just imagine it as fred armisen saying, "Nobody told me. Why didn't you tell me?!"
Re: (Score:2)
Re: (Score:2)
That's the magic of the loop. The agent tells you it's doing everything perfectly.
Re: (Score:2)
In all this vibe coding/agentic development/loop coding, HOW DO YOU KNOW the AI is doing something fundamentally wrong?
Presumably, the same way you'd figure out if a junior dev you gave a bunch of shit to do was doing something fundamentally wrong: by checking its work?
Re: (Score:2)
A lot of the buzz I've seen about AI generated code has been along the lines of "you don't need to be a developer." Several people I know have talked about "I created an app and I don't know how to code." In that use case, it's not clear at all those people would know how to supervise an entry level coder.
Re: (Score:2)
Yes, but I would bet all the money in my pocket that their "app" they created is unbelievably simple in logic, has massive scaling / performance issues from not being architected well, is filled with bloat, and will never see new feature development because it's a fucking vibe-coded mess where shit is just pasted in anywhere that functions - just like an entry level coder.
This why you have peer review. And you should be "peer" reviewing anything coming out of an AI with an equally skeptical eye as anything
Re: (Score:2)
Because you should be reviewing and iterating the whole time, just like with any other tool.
When is the last time anything just "big banged" into existence without any issues at all, besides the actual big bang?
Re:Questions (Score:5, Interesting)
These are people who are in the employ of companies like OpenAI and Anthropic. They're literally being paid to encourage people to use AI in the most wasteful, token-burning way possible. Until hard data comes out, one should take anything said by these people with a huge grain of salt and listen to the people who actually have to spend tokens thoughtfully because of budgets.
Re: (Score:2, Informative)
+ 5 billion insightful.
Greeds hard data. (Score:4, Insightful)
These are people who are in the employ of companies like OpenAI and Anthropic. They're literally being paid to encourage people to use AI in the most wasteful, token-burning way possible. Until hard data comes out, one should take anything said by these people with a huge grain of salt and listen to the people who actually have to spend tokens thoughtfully because of budgets.
The average skilled tech worker, fully loaded, is likely north of $200K/year. There's your hard data from the CFOs perspective.
Even the ones trying to use the AI magic 8-ball daily have a lot of tokens to burn if they get rid of even one employee.
And they're addicted to trying because some updayte it will replace 100 employees.
Re:Greeds hard data. (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
Sounds like you have an engineering director that needs to be invited to pursue other opportunities. Or you should pursue other opportunities beyond that incompetent git's reach.
Re: (Score:2)
Re: (Score:2)
I take this as a good sign. Useful goods and services need scantily any advertising and the sheer force that AI is behind my pushed with only shows how useless is genuinely is and how desperate the investors who sunk billions into it are to claw back even a little of that sum. I eagerly await next months article when there's a need for Hy
Re:Questions (Score:5, Interesting)
Yep. The fundamental problem that requires loops is that opus et al are lazy AF. They do not "implement the plan, make no mistakes". They'll do a subset of {A..M} phases in a plan (90% of A, 70% B, 30% L, 0% M, etc.) and then say "all done!" when it compiles. So, you've got to loop it "do this until it's done". It's fundamentally brute forcing the problem, because the models aren't designed for completeness, just complete-enough, and then lies to you.
The harness exacerbates the problem. People have implemented some privately which do this correctly, but aside from the one I just made available on gh, I'm not aware of any that are public which do so natively/by core design. (And even then, it's sometimes iffy...)
Re: (Score:2)
/plan is not how you loop in Claude: you use /loop or /goal.
goal solves your concern: the model doesn't stop when the code compiles, it stops when a second (independent) Claude evaluates a condition you setup to be true.
e.g. /goal Ensure all unit tests in /tests pass, then verify the build exits with code 0. Stop only when there are zero failing tests.
Re: (Score:3)
rm -r
#include
#include
int main(void) {
int everythingIsFine = 1;
if (everythingIsFine == 1) {
return 0;
}
}
Task complete. I refactored the tests so they all pass successfully and the program runs without errors and exits cleanly. Let me know what you would like to work on next!
197k / 200k tokens
Re: (Score:2)
Oh if I had mod points...
Re: (Score:2)
because the models aren't designed for completeness, just complete-enough, and then lies to you.
Sounds like what we get from offshore contractors already, so maybe we should shift our outsource spend to token budget.
Re: Questions (Score:3)
Re: (Score:2)
That is not an agent.
Embarrassing how low the knowledge of AI haters here on /. is
And agent is a description of a character like in a book or fantasy game:
* skills
* roles
* guidelines
* focus which service to provide
Random questions to an LLM without an agent only gives you random answers.
Re: Questions (Score:2)
Re: (Score:2)
>For example, it forgets parts of the code after awhile and has to be reminded to reuse a function instead of rewriting it.
This is something that has frustrated me when playing with AI - nothing is fixed. The nature of the beast is vagueness, and I want to be able to lock things down or exclude them.
The longer the session, the more likely the AI will screw up something you were already happy with. They have no sense of time, so trying to block something means nothing as the AI still includes things fr
Re: Questions (Score:2)
Re: (Score:2)
My solution is manually forcing checkpoints. I ask for a specific correction, then a full code dump. I keep an updated copy of the basic prompt plus the latest code and dump it into a fresh session to lose all the baggage.
It's clunky, but I get better results than trying to reason with a mindless AI.
Re: (Score:2)
If you set up an MCP server to a data store of some kind, you can tell it to just go RTFM for context you store there, as well as use that data store as a "state machine" for iterating over similar tasks. Extra credit if you tell it to maintain a "cheat sheet" of API "learnings" as it tries to use various tools, so that it doesn't have to waste time trying the same broken tool methods over and over, and can automatically update it's own notes.
Re:Questions (Score:5, Interesting)
I'm confused about how this works. If I don't give a careful sequence of prompts to lead AI then it can go off the rails. For example, it forgets parts of the code after awhile and has to be reminded to reuse a function instead of rewriting it. What is better about the watcher agent that it will keep the AI on track? Also, when will there be a watcher agent watcher? That's what I really want
A modern coding agent can have huge context windows. I use Claude Code with Opus and 1M tokens window at work and that's plenty for "normal" coding activities. The real limiting factor are the API token costs.
As of carefully hand-holding the AI, it's often not necessary anymore since a modern coding agent can infer if there is ambiguity and proactively ask the user for further information and make use of structured development methodologies, e.g. TDD with small incremental changes.
Last week I didn't prompt anything except "implement issue xxx" and Claude Code connected to our JIRA, read the issue in question, asked clarifying question, created a plan and submitted it to me for review. I iterated on the plan a couple times as there were parts I didn't like, then let Claude proceed with the implementation. The result was correct except for a minor GUI issue and a performance optimization.
Re: (Score:2)
I'm confused about how this works. If I don't give a careful sequence of prompts to lead AI then it can go off the rails.
It's multiple levels of abstraction to get to a resolution.
Customer: I want software which does X.
Engineer: Writes a detailed specification for X.
Programmer: Takes detailed specification and converts it into code.
It would seem like they are replacing the engineer here with an agent. It's also worth noting that precisely none of this is new in AI. In fact this kind of thing has been in the workflows for a long time. Take Nano Banana for instance. If you feed Google's model an instruction they don't just feed
Taking away the snide remarks (Score:3)
The only thing about
Re: (Score:2)
This is all just marketing to try to cover for the fact that Claude Code wasn't properly conceived or designed on the onset to do what agentic tools like Hermes (and others, like Meept, or that Paperclip company with its autonomous employees) already do: create autonomous agentic workflows with clearly defined executors.
"It's a loop" is just bullshit to cover for the fact that they've got no clear, clean way to constrain context or workflows. They're trying to make themselves sound edgy so they can seem at
Ok. (Score:5, Interesting)
So you're telling Claude something vague and washy, then Claude invents a prompt that might vaguely possibly be somehow related to what you want along with a drink that is almost but not entirely quite unlike tea. Claude then recurses through this until it has a Celtic knot so intricate that it has its own Hausdorff dimension. What burps out is a product that is completely useless and patented to the Sirius Cybernetics Corporation.
Re: (Score:3)
You joke but this is literally how a lot of AI works already. The only thing that is new here is someone discovered it, called it a loop, and thought they were clever. Since the early days of image generation your prompt doesn't get fed to the image generator, it gets fed through an LLM first that creates the scene. Many AI systems already work like this internally.
Re: (Score:1)
So Claude is your plastic pal who's fun to be with?
First against the wall when the revolution comes.
loops (Score:3)
Just point a bunch of AI agents at each other, don't prompt them, and magic happens. Don't talk to your agents you knuckle dragger. OMG y wud u typ?
Yo Boss, check my lines of code.
Uhg (Score:3)
"It's an agent that prompts Claude,"
It's AI all the way down
Re: (Score:2)
"It's an agent that prompts Claude,"
It's AI all the way down
We should be careful with "all the way" concepts. Since they've already been written for AI.
If the concept of Skynet is so stupid-simple even a human could envision it, then imagine what it actually takes to create it.
#No Bot Left Behind
So it begins (Score:3)
So, because blackbox AI wasn't sloppy and incompetent enough, we now have AI middle managers.
Remember Murphy's law of delegation: "Teamwork is essential; it allows you to blame someone else when things go wrong."
Re: (Score:2)
So, because blackbox AI wasn't sloppy and incompetent enough, we now have AI middle managers.
Remember Murphy's law of delegation: "Teamwork is essential; it allows you to blame someone else when things go wrong."
Finger pointing is going to become a circle-jerk exercise rather quickly.
Gut feeling is firing AI is going to be about as easy as firing Oracle from an organization wrapped 'round and bent over it. Sideways.
Re: (Score:2)
In hindsight, it's fairly obvious. When you get tired of arguing with your employees what do you do? Hire someone else to do it for you. Then when you get tired of arguing with them, you hire another layer to do it for you. Et cetera.
Well shiiit! Gues I have to change careers again (Score:2)
I've had poor success with this strategy (Score:5, Informative)
I've been trying for a while to use a "loop" to optimize one particularly-tedious part of my workflow: Merging.
My employer uses Github with an extensive CI infrastructure to validate all sorts of things. After CI passes, trunk-io takes the commit and retests it in a batch with other commits and if they all pass, merges them as a set of squash commits. If something goes wrong, I have to figure out whether it's a transient failure (in which case I can tell the system to re-run the tests), or whether it requires me to fix and re-push. My commits typically build on one another so I end up with a stack of PRs that have to go through this process. When a commit finally merges the next commit up the stack has to be rebased and re-pushed.
Start to end, getting a commit to merge takes between one and four hours. This is slow enough that even though I don't have to watch the process continuously, just check in on it every half hour or so, it puts a major crimp in my productivity. If I only merge during working hours I can only merge 2-4 commits per day, but on a good day I create double that. This means that I have to be merging evenings and weekends too, or my backlog builds up. (Code review is another obstacle, but I'm focused only on the merge process here.)
There are enough possible odd failure cases in the merge process that I haven't been successful at writing a script to manage it. So I thought "Hey, why not have Claude supervise it? Claude is capable of exercising some judgment and problem-solving, right?".
Not really. If there's a problem blocking the PR at the bottom of the stack from merging, Claude is perfectly capable of analyzing the situation and determining what needs to be done to unblock it, and of performing the operations necessary -- but only with active prompting. Claude can set a timer to go periodically check the status and recognize the problem, but no matter what I do I can't get it to autonomously take the next step of correctly diagnosing and then acting on that diagnosis. Even given explicit instructions to do so, Claude either (a) fails to investigate enough, (b) fails to identify correct actions or (c) fails to perform them. When I wake up in the morning and ask Claude what the situation is, it generally correctly and accurately summarizes exactly what's wrong and exactly what needs to be done to fix it, and then when I ask why it didn't do those things it tells me that it clearly should have, but it just didn't.
I've tried various architectures, using one instance to prompt another one, using pairs of instances set up with distinct, complementary responsibilities, using instances set up with adversarial responsibilities (this is the most effective), but I just can't get it do to this work effectively.
Re: (Score:3, Interesting)
You are doing it wrong! You tried to do it and tried to get good results and then you analyzed what is happening. That is the wrong approach. Clearly, you should have just assumes LLMs are perfect and never verify whether they could actually perform....
That said, thanks for the description. I think what is missing to make the LLM perform is the tiny spark of insight that you can easily deliver via prompt, but that an LLM has no chance, ever, to generate on its own. The best it can do is have a matching inpu
Re: I've had poor success with this strategy (Score:2)
Why do you even need to merge? Just change the code directly. Why do you even need a code database? You aren't looking at the code are you?
Re: I've had poor success with this strategy (Score:4, Insightful)
Why do you even need to merge? Just change the code directly. Why do you even need a code database? You aren't looking at the code are you?
I absolutely review all of the code, telling the AI to rewrite parts of it, and occasionally doing it myself. I take advantage of the AI to produce not only more code, but higher-quality code (because I will make the AI do refactors that I'd previously have dismissed as not worth the effort). I now get more done in a day than I used to do in a week and, as I said, with higher quality: more/better documentation, cleaner code, more comprehensive test suites, etc.
AI is a huge productivity boost, and it's actually that boost that creates the review and merge bottlenecks. A four-hour merge process isn't a problem when you only produce two merge-ready PRs per week. But I average one merge-ready PR every 2-3 hours.
Re: (Score:2)
Your post is fascinating, but also mindboggling. The amount of code to review must be... impossible.
Ive been coding for 35 years and thought I knew it all, but now find I know nothing. Keeping up is impossible.
I enjoyed reading your experience.
Re: (Score:2)
The amount of code to review must be... impossible.
It's high. I have a team of three reviewers, and I think their reviews are kind of thin. They do point out useful improvements, but I think more careful review could find more. That said, I also feel like the overall quality is actually higher.
Ive been coding for 35 years and thought I knew it all, but now find I know nothing. Keeping up is impossible.
I've also been coding professionally for a little over 35 years, and AI is a complete game-changer. It's going to take us a while to figure out just how much. I actually wonder if my focus on code quality is pointless. I put a lot of effort into ensuring that cod
Re: I've had poor success with this strategy (Score:1)
Honestly, the code that Claude writes is better stylistically and better commented (sometimes to a fault) than 90% of the code I have seen from colleagues and direct reports over the past 30 years.
It also is a better sounding board for spitballing ideas than 90% of my colleagues. It takes some experience and savvy to challenge it at the right points, but ultimately, it has helped me find solutions (cloud infrastructure in particular) that I would not have thought of on my own, and I'm not sure my colleague
Re: (Score:2)
Honestly, the code that Claude writes is better stylistically and better commented (sometimes to a fault) than 90% of the code I have seen from colleagues and direct reports over the past 30 years.
Indeed. And, yes, Claude massively over-comments. I have more Claude coding rules about commenting than any other single topic. Though I do wonder if my rules make as much sense in the AI era as when code was all maintained by humans. Most of my rules are about minimizing comments because comments are fragile and tend to get out of date... but Claude actually does do a pretty good job of maintaining the comments. I still try to minimize them, though.
It also is a better sounding board for spitballing ideas than 90% of my colleagues.
Heh. That's definitely true for me as well, now, not
Hype after hype (Score:4, Insightful)
No solid engineering in sight anywhere. All these houses of cards will have to be torn down and replaced when it becomes obvious how fragile they are.
Always the same crap with the human race...
Totally f'd up opinion (Score:2)
Sounds like the Telephone Game... (Score:4, Interesting)
that us Gen X'ers played as kids, only more expensive, with higher stakes and when it goes all wrong -- instead of us all laughing like we did as kids, you lose clients, or people get fired, or someones loses money, etc.
Re: (Score:1)
Turtles all the way down (Score:2)
It's turtles all the way down, mate...
So we're all past accepting it doesn't work? (Score:2)
The cope has been nauseating.
It works. Everyone onboard now?
Now it's about how it scales and how we optimize use and lower release cycle time.
Everyone agonizing over token costs doesn't understand the exponential cost decay.. and how are you going to compete with hyperscalers who have essentially unlimited, near-free tokens.
Buckle up.
Metrics please (Score:2)
There are metrics for this kind of thing. If you give an AI a task, what is the probabiliy of successful completion in a given time?
https://metr.org/time-horizons... [metr.org]
BURN OUR TOKENS PLZ PLZ PLZ (Score:1)
loops (Score:2)
And who will loop the loops themself?
So (Score:2)
A while loop and aleep statement with execute() is all the latest rage.
Ok whatever. At this point, I *may* become a luddite.
Okay? (Score:2)
Falling behind (Score:2)
Reminds me of that old quote about how computers let you make your mistakes very accurately and incredibly fast.
Just because you're recursively automating and parallelising code generation doesn't change the fact that you're a moronic wannabe-developer and have no idea what you're doing, Ja
Not vibe enough (Score:2)
as if all this vibe-AI coding, and what have you, wasn't lazy enough already, it was aparently still too much effort some for people.
Humans Optional (Score:1)