Harness engineering: a new term for an old problem

The term harness engineering has been making the rounds. I even like it. But it only confuses things. We already have the words to describe the problem.

May 14, 2026

“Harness engineering” is the latest term to enter the engineering-apps-that-use-LLMs conversation. It became particularly popular after Claude Code leaked and people got to see what a “harness” looks like at scale.

The term is meant to describe all the code that wraps around LLMs to get them to do something useful. It also arrives while we are still some way of having clarity on what an agent is. Nevertheless, as the brave people that we are, we added “harness” engineering to sit next to “agentic” engineering in a heartbeat. A new unclear term layered on top of an unfinished one. As you may have guessed this piece will have “grumpy old man” energy, but here we are.

What the “harness” term is meant to capture

Don’t get me wrong. The term is even elegant. It evokes LLMs as wild creatures of some sort. Unchecked they can’t be trusted. They need a harness to guide them in the right direction.

Birgitta Böckeler frames it as Agent = Model + Harness. Honestly that is not wrong. Have no issue with the thinking. The article describes the harness as consisting of guides and sensors. Guides are feedforward controls that steer the agent before it acts: prompts, scaffolding, planning structure. Sensors are feedback controls that let the agent self-correct after acting: linters, tests, type checkers, code review agents, LLM judges. Each can be computational, meaning fast and deterministic, or inferential, meaning semantic and AI-based.

Ryan Lopopolo, on OpenAI’s Codex team, describes the same shift in different language. He writes that the engineering job is now to “design environments, specify intent, and build feedback loops that allow Codex agents to do reliable work.” The slogan he gives it is “Humans steer. Agents execute.” The pattern he calls the Ralph Wiggum Loop is the agent reviewing its own changes, requesting further agent reviews, and iterating until the reviewers are satisfied.

Both converge on the same point. The interesting engineering surface is not inside the model. It is in the controls, the environment, and the feedback loops around it.

So what’s my beef with the term?

Let’s double down on agency, not create new terms.

All harness engineering is describing is agent engineering. In this messy world where things change so quickly lets spend more time to better understand what agent engineering should be and clarify it, not add more terms that will inevitably show up in LinkedIn profiles and job posts. “Looking for rock star harness engineer” is an inevitability at this point.

I may by tooting my own horn, but there are ways to talk about harness engineering in terms of agency. In a previous post I name three layers of agency in any LLM-powered agent system.

Emergent agency is the agent-like behaviour that arises from the model itself: dialogue coherence, reasoning, pattern-following. You do not engineer this layer, you inherit it from the model you choose.

Structured agency is how engineers turn those capabilities into reliable behaviour, through tools, memory, planning, control flow, and feedback loops. This is the layer harness engineering is (mostly) talking about.

Perceived agency is what users actually experience: communication style, error handling, transparency, where control is offered and where it is taken away.

I go on to say that these layers have to line up. A model with strong reasoning still ships as an incompetent agent if the structured layer does not harness that capability. A well-engineered harness still ships as a frustrating agent if the perceived layer misrepresents what is happening underneath. Misalignment between layers, more than any single component, is where user frustration is born.

A coding agent makes this concrete. The model’s reasoning is the emergent layer. Böckeler’s guides and sensors, and Lopopolo’s Ralph Wiggum Loop, are the structured layer. The pull request, the review comments, and the IDE suggestion surface are the perceived layer.

Agent engineering

Harness engineering describes a real need and real activity but it does this at the cost of ignoring mental models and vocabulary that is already there like agents, multi-agent systems, co-operation and orchestration.

If you find yourself reaching for “harness engineering” to describe a piece of work take a minute to ask yourselves what other terms would have described this and what work done under those terms would help you do the work you need to do. Otherwise you risk creating a new blank page and re-inventing solutions that are already ready for you to use under different headings.

Discussion about this post

Ready for more?