Thought Leadership

The Harness Is Replaceable. The Decision Problem Isn't.

Rippletide mascot in the Decision Room with Approve and Deny buttons, holding a verification shield

Every major AI lab is racing to make their harness obsolete. Cursor benchmarks their harness to Claude Code one by using a fine-tune coding model. That's not a metaphor. Anthropic's engineering team published a post where they describe, in precise technical terms, why every harness assumption goes stale and why Managed Agents is designed to throw the harness away as models improve.

If you've built a moat on the construction layer, that post is the sound of the floor dropping out.

I. The harness is a throwaway that is meant to be redesigned for each new model

Anthropic's example is instructive. They had built a context-reset mechanism to handle what they called "context anxiety", Claude Sonnet 4.5's tendency to wrap up tasks prematurely as it neared its context window limit. A reasonable engineering response to a real behavior.

When Opus 4.5 shipped, the behavior was gone. The mechanism was dead weight.

Their response was not to patch the harness again. It was to redesign the system so the harness could be discarded. Session, harness, sandbox, three decoupled interfaces, each independently replaceable. A meta-harness, as they call it, "unopinionated about the specific harness Claude will need in the future."

This is the trajectory. In six months, deploying an agent will be as defensible as deploying a website in 2012. The abstraction rises, the construction layer commoditizes, and the teams that spent three years building proprietary harness infrastructure will be staring at a very uncomfortable slide in their next board deck.

The builder is not the product anymore.

II. But the harness was never the hard part.

Here's what the Anthropic post makes clear, between the lines: the harness is a collection of workarounds. Assumptions about model behavior, encoded in infrastructure. It changes every time the model gets smarter.

What it does not change, what no model improvement has ever resolved, and what the architectural decoupling Anthropic describes explicitly preserves as an open question, is this:

What decides whether the agent is allowed to act? Right now. In this context. For this organization.

That question does not live in the harness. It does not live in the context window. It does not get solved by a smarter model or a better abstraction layer. It is a trust problem disguised as an engineering problem. And most teams are trying to solve it with a prompt.

III. A category is forming around the wrong layer.

A wave of products is emerging around agent control. Sandboxing, filesystem permissions, network restrictions, IT-level access management. Necessary work. Real problems being solved.

They answer: can the agent technically execute this action?

That is not the enterprise question.

The enterprise question is: should the agent execute this action, given our business rules, our compliance constraints, the state of this customer's account, the approval threshold in effect today, the policy revision issued three weeks ago by the compliance team in an internal memo?

Those are different questions. Conflating them is precisely why production deployments stall. The answer is regarding business processes, not the sandbox environment.

A bad answer is an error. A bad action is an incident. The infrastructure required to prevent each one is structurally different.

That is why the category should not be only about infrastructure or context only, but something closer to the business.

Read more on why: Context without enforcement is not infrastructure

IV. The gap has a name.

Between "the LLM proposes" and "the system executes", there is a function that does not exist in most agent stacks.

Not guardrails in a prompt. Prompt-based constraints are probabilistic. The model decides whether to follow them. That is not enforcement. That is suggestion with good intentions.

Not monitoring. Monitoring is retrospective. It tells you what went wrong after the action fired.

What's missing is a runtime decision layer: deterministic, pre-execution, operating on structured rules and verified data, not token predictions. A layer that intercepts every proposed action before it reaches the world, evaluates it against the operative rules of the organization, and returns a binary answer with a complete causal trace attached.

The agent either has the authority and the data to act, or it does not.

No gray zone. No "it depends on context." A verifiable answer, resolved before execution, that any audit or compliance function can inspect.

This is what infrastructure looks like. Not probabilistic guidance. Deterministic enforcement.

V. Why this is the 2026 bet.

The Anthropic post contains a sentence worth reading carefully. They describe their session log as a "context object that lives outside Claude's context window": durable, interrogable, surviving every harness restart, every model upgrade, every architectural refactor.

That is the design principle. The things that matter should not live in the harness. They should live in a layer that persists across harness generations.

Business rules are that kind of thing. Authority structures are that kind of thing. Compliance constraints, approval thresholds, policy versions, none of these change because a new model shipped. They change because the organization decided to change them.

Opportunity to learn more : Context Graphs: What They Actually Solve?

A decision runtime that lives outside the prompt, outside the context window, outside the harness, survives every upgrade cycle. It becomes more valuable as the agent stack scales, not less. It is the layer that earns the right to be called infrastructure.

Models improve. Harnesses change. The decision problem remains.

No one runs a nuclear plant without a control room. No one runs a trading floor without a kill switch.

We are shipping agents without decision enforcement.

That is the gap.

Rippletide is the decision runtime for AI agents. The LLM proposes. Rippletide decides.

Want your agent to enforce rules without ever regressing?

Further reading on this blog

Continue Reading