Harness Basics

Summary — Key Takeaways from This Lesson

  • A harness is everything except the model (Claude itself) — prompts, context, tools, feedback loops, and persistence.
  • For long-running agent tasks, success depends more on harness design than model capability.
  • A harness can be organized into four layers: the Instruction Layer, Context Layer, Tool Layer, and Persistence Layer.
  • By incorporating feedback loops (tests, linting, error output), agents can run self-correction cycles autonomously.
  • Anthropic's Engineering Blog has published best practices (primary source).
目次 (9)

What Is a Harness?

When running an agent task with Claude Code, the model (Claude itself) is only one piece of the whole. A harness refers to the entire framework that makes the model work effectively, and it includes all of the following:

  • Instructions (system prompt, task description)
  • Context (conversation history, files to load, project structure)
  • Tools (code editing, command execution, search, browser operation)
  • Feedback loops (test results, lint errors, corrections from users)
  • Persistence (memory files, progress tracking, Git state)

Anthropic's Engineering Blog calls this "harness engineering" and positions it as an engineering discipline focused on designing, building, and improving the systems that increase the success rate of coding agents (source).

The Four-Layer Structure

When designing a harness, thinking in terms of the following four layers helps keep things organized.

1. Instruction Layer

The system prompt and task description define "what is allowed and what is not." In Claude Code, placing a CLAUDE.md file at the project root lets you have the agent continuously reference project-specific conventions, prohibited operations, and naming rules.

2. Context Layer

This controls the scope of information the model can use for its decisions. Should you pass the entire file, or extract only the relevant portions? Since the context window is finite, it is important to design your system to pass only the necessary information with high precision.

3. Tool Layer

This is the set of actions (tools) available to the agent. This includes reading and writing files, executing commands, web search, and connecting to external services via MCP. More tools mean more flexibility, but also a higher risk of misuse — so the basic principle is to limit the tool set to the minimum necessary.

4. Persistence Layer

For long-running tasks, a mechanism to retain "how far we got last time" is essential. By using progress files, Git commits, and memory files, you can ensure tasks continue across sessions. In Anthropic's Managed Agents architecture, session state is externalized as an event log, achieving a design where state is not lost even in the event of container failures (detailed article).

The Importance of Feedback Loops

One of the most impactful elements of harness design is the feedback loop. By returning test results, lint errors, and type-check output to the agent, it can autonomously run the cycle of "write → execute → check errors → fix."

In a harness with weak feedback, agents may repeat the same mistakes or incorrectly conclude that the work is "done" without actually running anything. Writing tests and setting up CI are worthwhile investments even in the context of agent utilization.

Practical Design with CLAUDE.md

As a first step in harness design, it is recommended to include the following in CLAUDE.md:

  • An overview of the project and an explanation of the directory structure
  • Coding conventions and naming rules
  • Files and operations that must never be modified
  • Test execution commands and how to verify them

For a comprehensive look at harness engineering, see "Introduction to Harness Engineering". For the design philosophy behind Managed Agents, also refer to "Separating Brain and Hands with Managed Agents".

参考になったら ♡
Clauder Navi 編集部
@clauder_navi

Anthropic の Claude / Claude Code を中心に、日本のエンジニア向けに最新動向と実務 を毎日発信。 運営方針 は メディアについて をご覧ください。