Articles/AI & Tooling/Loop Engineering in Claude Code: Designing the System That Prompts the Agent

Loop Engineering in Claude Code: Designing the System That Prompts the Agent

Loop engineering is the shift from prompting an agent step by step to designing a system that prompts it for you. Here is what that means in Claude Code, the building blocks it already ships, a worked example, and an honest look at when it is worth the tokens on a subscription.

June 17, 2026·11 min read

#Claude Code #Loop Engineering #Automation #AI #Developer Tools

A tweet from Addy Osmani reframed how I think about working with Claude Code. He put it this way: "Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead. A loop here can be thought of a recursive goal where you define a purpose and the AI iterates until complete."

He is not the only one saying it. Peter Steinberger put it more bluntly: "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents." And Boris Cherny, who heads Claude Code at Anthropic, said the quiet part out loud: "I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops."

It is easy to wave this off as a cron job wearing a hat, and that reaction is not entirely wrong. But the framing stuck with me, because once you go looking, every piece you need to build one of these loops is already sitting inside Claude Code. This is a tour of those pieces, with the examples I actually reach for, and a straight answer to when a loop is worth it versus when you are just lighting tokens on fire.

From prompting to designing the loop

The familiar way to work with a coding agent is straightforward. You write a good prompt, give the agent enough context, read what comes back, and type the next thing. You are holding the tool the entire time, one turn after another. That works, and for a lot of tasks it is still the right way to work.

Loop engineering is what happens when you stop being the thing in the loop. Instead of prompting each step, you build a small system that finds the work, hands it out, checks the result, writes down what happened, and decides what to do next, and you let that poke the agent. The "recursive goal" idea is the heart of it: you define a purpose once, and the agent iterates until that purpose is met.

The part that surprised me is how little custom plumbing this takes. You might expect a loop to mean a pile of bash you own and babysit forever, but the pieces already ship inside the product itself. If you have read my notes on prompting and the token economy, think of this as the layer above that: not how to phrase one turn well, but how to arrange many turns so you do not have to phrase each one at all.

The building blocks Claude Code already ships

Osmani's list is five pieces plus one place to remember things. Here they are, each mapped to the actual Claude Code mechanism.

Automations: the heartbeat

A loop is only a loop if something kicks it off again. In Claude Code that is /loop for in-session cadence, hooks for firing shell commands at points in the agent lifecycle (SessionStart, PreToolUse, PostToolUse, SubagentStop, and more), and GitHub Actions when you want the thing to keep running after you close the laptop. You define an autonomous task, give it a trigger, and let the findings come to you instead of going around checking.

Worktrees: so parallel does not become chaos

The moment you run more than one agent, they start colliding on the same files. A git worktree is a separate working directory on its own branch sharing the same repo history, so one agent's edits cannot touch another's checkout. Claude Code leans on this directly: you can open a session in its own worktree, and you can give a subagent its own isolated checkout so each helper works in a fresh copy that cleans itself up afterward. The worktrees remove the mechanical collision, but your review bandwidth is still the ceiling on how many agents you can usefully run.

Skills: stop re-explaining your project every session

A skill is a SKILL.md file in a folder under .claude/skills/, holding the project knowledge the agent would otherwise guess at every time: your conventions, your build steps, the "we don't do it like that because of one bad incident" rules. Without skills, a loop re-derives your whole project from zero on every cycle. With them, the knowledge compounds. The tight, boring description matters more than clever instructions, because that description is what tells the agent when the skill applies.

Plugins and connectors: let the loop touch your real tools

A loop that can only see the filesystem is a tiny loop. Connectors built on MCP (the Model Context Protocol) let the agent read your issue tracker, query a database, hit a staging API, or drop a message in Slack. Plugins are how you bundle skills and connectors together so a teammate installs your whole setup in one go. This is the difference between an agent that says "here is the fix" and a loop that opens the PR, links the ticket, and pings the channel once CI is green, by itself.

Sub-agents: keep the maker away from the checker

The single most useful structural move in a loop is splitting the agent that writes the code from the one that checks it. The model that wrote something is far too generous grading its own homework. In Claude Code you define a subagent as a Markdown file with frontmatter under .claude/agents/:

MARKDOWN

---
name: spec-reviewer
description: Reviews a diff against the project spec and tests. Read-only.
tools: Read, Glob, Grep
model: sonnet
---

You are a reviewer. Check the change against the existing tests and the
conventions in the project's skills. Report what is wrong; do not fix it yourself.

The pattern I keep reaching for is one agent explores, one implements, and a different one verifies. That split is not a built-in feature with a fancy name, it is just how you wire the agents up. Subagents do cost more tokens, since each one runs its own model and tools, so spend them where a second opinion actually earns its keep.

Memory: the agent forgets, the repo does not

The sixth piece is the one that sounds too dumb to matter: a place to write down what is done and what is next, outside the conversation. A Markdown state file, a checklist in the repo, an issue tracker board, anything durable. The model forgets everything between runs, so the memory has to live on disk, not in the context window. This is the spine of any loop that runs more than once.

The two primitives: `/loop` versus `/goal`

Two in-session commands do most of the work, and the difference between them is the whole point.

/loop re-runs a prompt or command on a cadence. Give it an interval and a prompt and it schedules the job:

BASH

# fixed interval
/loop 5m check if the deploy finished and tell me what happened

# omit the interval and Claude self-paces between 1 minute and 1 hour,
# waiting longer when nothing is happening
/loop check whether CI passed and address any review comments

# re-run a saved slash command each iteration
/loop 20m /review-pr 1234

A bare /loop runs a built-in maintenance prompt (continue unfinished work, tend the current PR, run cleanup passes). You can replace that with your own default by dropping a .claude/loop.md in the project or ~/.claude/loop.md for all projects. Press Esc to stop a loop while it waits for the next iteration. A few guardrails apply: recurring tasks expire after 7 days, a session holds at most 50 scheduled tasks, and tasks are session-scoped, so they live in the current conversation and are restored only if you reopen it with --resume.

/goal is the one closer to the recursive-goal idea. Instead of a clock, it runs until a condition you wrote is actually true:

BASH

/goal all tests in test/auth pass and the linter is clean

After every turn, a separate, smaller model reads the condition and the conversation and returns a yes-or-no plus a short reason for the next turn. The agent that wrote the code is not the one deciding whether it is done. That is the maker/checker split applied to the stop condition itself, and it is why you can actually walk away.

So: /loop for "keep checking on this," /goal for "keep working until this is true." When you need it to survive a closed laptop, push the whole thing to GitHub Actions instead.

What one loop actually looks like

This is the shape I keep coming back to, stitched together from the pieces above.

A scheduled run fires on the repo every morning. Its prompt calls a triage skill that reads yesterday's CI failures, the open issues, and the recent commits, and writes the findings into a state file:

MARKDOWN

Read the latest CI runs, open issues labeled `bug`, and commits since the
last entry in `notes/triage.md`. For each problem worth fixing:

1. Open an isolated worktree and send a subagent to draft a minimal fix.
2. Send a second subagent to review that draft against the project's skills and
   the existing tests. It must not edit the code, only report.
3. If the review passes and CI is green, open a PR and link the issue.
4. If anything is unsure or risky, leave it in `notes/triage.md` for me.

Append what you tried, what passed, and what is still open to
`notes/triage.md` so tomorrow's run picks up where this one stopped.

Connectors open the PR and update the ticket. Anything the loop cannot handle safely lands in the notes file for me to look at. The state file is the part that makes it a loop and not a one-off, because tomorrow's run reads where today's stopped.

Look at what you actually did there: you designed it once, and you prompted none of those individual steps. That is the pitch in one example, and it works the same way whether you run it through /loop, a /goal, or a GitHub Action.

When to reach for a loop, and when not

A loop earns its place when the work is repetitive, the trigger is clear, and there is a real signal for "done" that something other than the maker can check. Babysitting a deploy or a PR, triaging CI failures every morning, grinding a test suite to green, hunting a class of bug across a codebase: good fits, because each one has an automatic way to know it worked.

A loop is the wrong tool when the goal is fuzzy and there is no verifier, when the action is irreversible and there is no human gate in front of it, or when the task is genuinely one-shot and a loop is just ceremony around a single prompt. If you cannot write the condition that means "done," you are not ready to write the loop yet.

When you do build one, ramp it. Start in report-only mode, where the loop tells you what it would do. Move to assisted, where it drafts changes you approve. Only then let it run unattended, and even then keep a gate in front of anything you cannot undo.

The subscription reality

If you are on a Claude Code plan rather than paying per token, loops change your usage math, and not always gently. Osmani's own caveat is the right one to keep in mind: outcomes differ wildly between the token rich and the token poor.

A few things to watch. Sub-agents multiply spend, because each one runs its own model and tool calls, so a three-agent explore/implement/verify loop costs roughly three times a single pass. A tight fixed interval burns tokens even when nothing is happening, so prefer a self-paced /loop (or /goal, which only runs when there is a turn to take) over /loop 1m on a quiet repo. Remember that /loop needs the session open and idle to fire and does not catch up on missed runs, so a babysat terminal is not the same thing as durable scheduling. Anything that genuinely needs to run while you sleep belongs in GitHub Actions, not a window you have to leave open. And treat the 7-day expiry as a feature: it bounds how long a forgotten loop can quietly run up your usage. When in doubt, start in report-only mode, where the loop reads a lot and writes almost nothing. For more on keeping spend sane, see Getting More Out of Claude Code.

Stay the engineer

The loop changes the work; it does not delete you from it. Three problems actually get sharper as the loop gets better, not easier.

Verification is still on you. A loop running unattended is also a loop making mistakes unattended, and "done" is a claim, not a proof. That separate verifier is what makes the claim mean something, which is the entire reason to bother with it.

Your understanding rots if you let it. The faster a loop ships code you did not write, the wider the gap between what exists and what you actually grasp. A smooth loop just grows that gap faster unless you read what it made.

And the comfortable posture is the dangerous one. When the loop runs itself, it is tempting to stop having an opinion and take whatever comes back. The same loop, in two different hands, produces opposite results: one person uses it to move faster on work they understand deeply, the other uses it to avoid understanding the work at all. The loop cannot tell the difference. You can.