Loop Engineering Explained

The short version

Loop engineering means designing the repeatable process an agent runs inside, not prompting it harder one step at a time. The agent reads the current state, chooses an action, checks the result, and decides whether to continue, retry, recover, or stop. A useful loop needs a trigger, a verifiable goal, failure handling, hard budgets, and human ownership of the final judgment.

The point is not to prompt the agent harder. It is to design the loop the agent runs inside.
A good loop defines context, tools, success criteria, failure handling, budget, and when to stop.
Loop engineering keeps the human in charge of judgment while letting the agent handle repeated execution.

If you are coding with Claude or Codex today, there’s a new paradigm you’re going to love. It cuts down the number of steps to the final output by half. Your current workflow probably looks like this: You write a prompt. Give file access to agents; the agent edits files. You accept all permissions. You run tests. Something breaks. You ask to fix it. Sometimes it works in one go, and sometimes you have to paste the error back or take a screenshot. It tries again. And after twenty minutes, you realize that you are babysitting the exact process you wanted to offload and you’re doing the dumb work, not the thinking.

But if agents are already good enough, why do you have to keep repeating this process? This new paradigm I mentioned, called Loop engineering, is the idea that allows you to stop being that babysitter. No need to “micro-prompt” them. You can have it work, or loop, with itself instead.

We moved from prompt engineering to context engineering to harnesses, and now loop engineering. And to be honest, they are all about the same thing, steering the model as best as we can, which is only possible through the context or the prompt we give it. This new one is worth understanding because it describes a real shift in how developers use coding agents in 2026.

By the way, before we continue, you may want to consider watching the rest of this article. I pay an amazing editor to make great visuals that will make the piece 10x better for you ;) Watch here: https://youtu.be/NjXIIH9vcv0

The term loop engineering exploded after Peter Steinberger, the creator of OpenClaw, posted that you should not be prompting coding agents anymore. You should be designing loops that prompt your agents. And it wasn’t a one-off hot take. Boris Cherny, who leads Claude Code at Anthropic, said the same thing: he doesn’t prompt Claude anymore — in his words, his “job is to write loops.” When the people building both Codex and Claude Code land on the same idea, it might be worth taking it seriously. The sharpest reply was basically: ‘okay, but what does that actually look like in practice’?

Because if a loop only means run the same prompt every hour, then we already have that. It is called a cron job. It is older than many of us.

With loop engineering, the difference is the decision-maker inside the loop. A cron job runs a fixed script. A loop runs an agent that looks at the current state, chooses the next action, does it, checks the result, and decides what to do next. Continue, retry, roll back, or stop. Here, the agent controls the loop, and it works because LLMs are now sufficiently capable of understanding proper goals and reward signals.

But for a loop to work at all, it needs two things before anything else: a trigger and a verifiable goal.

The trigger is what starts the loop. It could be a pull request opening, a failing CI run, a daily schedule, a Slack message, or you manually typing a command or sending the first prompt. The verifiable goal is what tells the loop it can stop. That can be deterministic, like all tests pass and CI is green, or softer, like a reviewer model checks whether the UI matches a spec. But there has to be some check. Otherwise you did not build a loop. You built a very confident token furnace.

Codex already does that automatically until the task you asked for is done, but you can also build this yourself with a loop that leverages these new models, for example, via an automation on Cursor.

Prompt engineering optimizes a single interaction. Loop engineering turns that into a repeatable process around many interactions. So now the prompt becomes a component within the larger system.

I like this framing because it matches what I’ve been feeling with coding agents lately. The prompt is rarely the hard part anymore. Especially since late 2025. The hard part is everything around it. What context should the agent see, what tools can it use, what counts as done, what happens when it fails, and how expensive is it allowed to be before we shut it down? That’s why most of my recent talks have been on compaction and memory. Now that models are much more intelligent, but also more expensive, we need to control what they have access to in order to reduce costs, reduce latency, improve long discussions, and improve results. It’s done by managing context intelligently. And finally, we have a term that builds this process into the system and integrates it into these same loops to be viable.

If you are thinking how this is different from React or agent loops like the Ralph loop, here’s what is new. Older systems let the LLM run again; with this approach, the loop becomes a unit of work.

It can run on a schedule. It can open worktrees. It can spawn sub-agents. It can write a state to a file or a Linear board. It can survive your laptop closing. Which also means it can survive without you, so it should be able to work without you, too, which then also means it shouldn’t need you to prompt it every single time.

In an amazing Twitter article, Addy Osmani breaks a loop down into five pieces plus memory, and I think this is the clearest practical explanation. First automations, so the loop wakes up on its own, or you can start it if you want. Second, Worktrees, so parallel agents do not overwrite each other. Especially when coding. I do this a lot with Codex. When I want to take one conversation in two completely different directions, I just split it into two worktrees and let each one diverge. Third, Skills, so the agent does not guess your project rules or even any rule you’d like about you, how you work, etc, every time you launch it. Fourth, Plugins or connectors, so agent can use tools like GitHub, Linear, Slack, or your database. Fifth, Sub-agents, so the one writing the code is not the same one judging the code. And then memory, because the model forgets, but the repo does not.

I have to emphasize the skills part because even though it’s a bit old now, the vast majority of people I see still underuse skills. A loop with no reusable skills just rediscovers your project from zero every run. It burns tokens re-learning what you already know. A loop with good skills starts to compound. The skill is where you write the convention, the examples, the test command, the things you never want repeated. It’s just a big list of markdown files. Make them as dense as possible. One skill for one task. You don’t want them to fill the context. Then, ask Codex or whatever agent to organize them and build an index. This way, your coding agent will simply have to use this index to know which skill to open and which skill it has access to, so you never have to prompt it again to open X and Y skill, but just to check the skill list.

So what does one loop look like?

A simple version could run every morning. It reads yesterday’s CI failures, open issues, and recent commits. It writes a short state file with what looks worth doing. For one issue, it opens a separate worktree and sends one agent to draft a fix. Another agent reviews it against your project skill and tests. If tests pass, it opens a PR and updates the ticket. If tests fail, it feeds the error back once or twice. If it gets stuck, it stops and puts the problem in your inbox.

That is loop engineering. You did not ask the agent seven times. You didn’t have to prompt it when you woke up to start working or to then do the PR this way and that way. You designed the seven-step system once. Or Codex designed it for you, but you thought about what should be done and automated it.

This is also the difference between automation and loops. Automation says: do step one, then step two, then step three. A loop says: look at the state, decide the next step, do it, check it, and decide whether another iteration is needed. It is closer to a tiny engineering process than a script.

But this is also where the hype gets dangerous. Right now, there are two big problems with loop engineering:

First, defining the goal is hard. It needs to be precise but also verifiable. Software development is often exploratory. You do not always know the final shape of the feature at the start. If the end state is fuzzy, the loop will optimize toward whatever vague sentence you gave it, and that can be worse than doing one careful manual pass. And coding is the easier task. If you are trying to automate a more subjective task or creative task like writing a YouTube script about loop engineering and telling it to make it “good”, it may just rewrite it indefinitely. Now that I think about it, it seems like a human trait some of us have with my dozens of unfinished scripts 😅

Anyways, the reward or just overall goal is where you need to put a lot of thought and consideration into, and experiment.

The second problem is cost. It can get ridiculous, fast. If you let an agent prompt itself continuously, review itself, spawn helpers, and keep retrying, you can burn through millions of tokens quickly. Especially if they run when you sleep or can’t check. This is why loop engineering is easiest to hype on Twitter when you work at a place with a huge token budget like the big labs. For the rest of us, the budget is part of the architecture. I personally want to manually launch my loops and check them to ensure everything goes smoothly.

This is why every serious loop needs hard brakes: a maximum number of iterations, no-progress detection, and a token or dollar budget that you can use per day. And it needs verification that is stronger than the agent saying it is done. Run tests. Typecheck. Use a reviewer agent. Compare the diff to the spec. In production, a claim is not done until something checks it.

I have the same stance here as I do with agents in general at Towards AI: start simple, then add autonomy only when it pays for itself. If the workflow is one-off, just prompt the model. If the work repeats and has a clear pass or fail signal, or, in other words, if you feel like you only do the dumb repeatable actions, build a loop. If the task is vague, like think of a better product strategy, maybe do not hand that to a while-loop and go make coffee to talk to some other humans and first figure out a better goal. Please.

So, the takeaway is not that prompt engineering is dead. It is that the leverage point moved. Five years ago you wrote code yourself. Two years ago, you prompted a model to write code. Last year, you watched Claude code for you and just accepted tasks one by one in case it fu*ks up. Today, for the right tasks, you design the loop that prompts, checks, retries, and stops.

Whatever you do, if you do not change your workflow or go all in with loops. Just make sure you stay the engineer. Read what it shipped. Own the quality. Write the skills, or at least control them. Define the stop conditions, the precise goal, and the sub-goals. You can use loops to either move faster on work you understand or to avoid understanding the work at all. Make the right choice for your future self!

What about you? Are you already using loops? I’d love to know your current agent setup. Let me know!

FAQ

What is loop engineering?

Loop engineering is designing the repeatable process around an agent: what it sees, what it can do, how it checks work, and when it stops.

How is this different from prompting?

Prompting gives one instruction. A loop defines an operating system for repeated attempts, feedback, verification, and recovery.

What can go wrong with loops?

A bad loop can waste tokens, repeat failures, or optimize the wrong goal, so stop conditions and checks matter a lot.

What should stop an agent loop?

A loop needs a measurable completion condition, failure threshold, time limit, or cost budget defined before execution.

Why keep agent memory in the repository or an external store?

External artifacts survive model context limits and let later iterations inspect prior decisions, outputs, and evidence.

Why use worktrees with parallel coding agents?

Each agent gets an isolated working directory for the repository, which prevents simultaneous edits from overwriting one another.

Loop Engineering Explained

Listen instead

The short version

FAQ

Analytics preferences