Draft in progress

What an AI Agent Harness Actually Is

The word "harness" is being captured by marketing faster than it's being defined carefully. This piece hands readers the engineering vocabulary before the vendors set it for them.

Section 1

The definition, and what it replaces

In June 2023, Lilian Weng proposed a formula that most readers now hold in their heads: Agent = LLM + Memory + Planning + Tools. That formula treated memory and planning as cognitive abstractions, which made sense for the GPT-4 era. In 2026, engineers shipping harnesses use a different formula: Agent = Model + Harness + Tools + Environment. This section describes both formulas, explains why the second replaces the first, and clarifies what the word "harness" does and does not mean. It is not a framework. It is not scaffolding. It is not MCP. It is a role: the runtime between the model and the environment.

Section 2

How a harness actually works

Memory is the event log. Compaction is a mutation over it. This is event sourcing — a pattern Martin Fowler and Greg Young described in the 2000s — applied to LLM context windows. This section uses that framing to describe what actually eats engineering time in a harness: compaction, tool registry and dispatch, state persistence, error handling, and prompt construction. It also describes the things most published component lists miss: projection versus ingestion asymmetry, streaming state machines, context budgeting as distinct from compaction, event ordering and mutation semantics, and telemetry of the harness itself.

Section 3

The specific problems that reveal the shape

Three concrete engineering problems from a developer shipping a harness. Ruby's Net::HTTP batched server-sent events into single reads, which required dropping to raw TLS sockets to get per-chunk delivery. Gemini's SSE separator is "\r\n\r\n" in some versions and "\n\n" in others, and parsers that match only one get 200 OK responses with zero tokens. Tool-call IDs diverge across providers: Anthropic issues toolu_, OpenAI issues call_, Gemini issues nothing at all, leaving function name as the correlation key. This section also describes the concurrency problem when multiple tools run in parallel and return out of order, and the latency and cost tradeoffs of running compaction in the background versus blocking the main loop.

Section 4

Three architectures

Claude Code is transcript-centric. It treats the conversation as a malleable artifact and runs a four-stage compaction pipeline (Snip, Microcompact, Context Collapse, Autocompact) with best-in-class prompt-cache management via deferred tool discovery. Codex CLI is state-replication-centric. It treats the conversation as an event-sourced ledger and handles compaction by spawning a new worker with a handoff summary rather than mutating the current one. AIOS is the frontier: it demonstrates GPU-level preemption via KV-cache checkpointing, taking the OS analogy seriously. Two axes organize these: transcript versus ledger (philosophy), and app-layer versus OS-layer (ambition). This section closes on the hardest open problem: how to unit test a harness when the model driving it is stochastic.

This is a working outline, not the final piece. Research is done, input from a shipping harness developer is in, and the frame was stress-tested by Gemini 3.1 Pro. Draft is pending. Target length ~2,000 words.