Claude Managed Agents — Keys to Reducing TTFT and Decoupled Design

For developers exhausted by rewriting their harness every time a model is updated, this article breaks down the decoupled design of Managed Agents as published by Anthropic. We cover — in an order that directly informs implementation decisions — how to carve out the brain (reasoning), hands (execution environment), and session (state) into independent abstraction layers; the mechanism that cuts TTFT by p50 60% / p95 90%; and the thinking behind Many Brains / Many Hands scaling.

Article Summary by AI Chatpowered by Claude
結論powered by Claude

Managed Agents adopts a 3-layer separation of brain / hands / session. By applying the virtualization principles of OS design to AI agents and moving the harness outside the container, it minimizes the cost of rewriting the harness with every model update.

The core of the implementation is transparent connectivity via the execute(name, input) interface. Containers, custom tools, and MCP servers can all be treated as "hands" through the same function call, achieving p50 60% / p95 90% TTFT reduction while enabling horizontal scaling in both directions.

The key caveat is that sessions must be externalized as durable event logs. While this ensures state is not lost even when a container fails, the harness must guarantee log persistence and rollback design — meaning systems built on a stateless assumption will need to be redesigned.

目次 (22)

Key to Decoupled Design 1: Why Separate the Execution Environment from Reasoning?

The Problem: Harnesses Were Designed to "Compensate for Model Shortcomings"

When building agents, developers must design not only Claude itself but also the surrounding harness (the entire execution control framework). The harness includes prompt control, context management, tool invocation, retry logic, and more.

Anthropic has identified a fundamental problem with this approach:

"Harnesses embed assumptions about what the model cannot do. But as models improve, those assumptions become outdated." Source

As a concrete example, Claude Sonnet 4.5 exhibited "context anxiety" near token limits, requiring workarounds implemented in the harness. However, Claude Opus 4.5 resolved this issue, making those harness workarounds unnecessary Source.

The more model assumptions are baked into the harness, the more the entire harness must be revisited with each model improvement. This is unsustainable from the perspectives of scalability, safety, and cost.

The Solution — Apply OS Virtualization Principles Across 3 Components

The solution Anthropic adopted is the virtualization principle proven in OS design. Just as an OS separates processes from hardware, Managed Agents virtualize 3 components as independent, stable abstraction layers Source.

Key to Decoupled Design 2: The Architecture Is 3 Layers — Brain / Hands / Session

Separating the 3 Components — Brain (Reasoning) / Hands (Execution) / Session (State)

The key points of this section are summarized below.

┌─────────────────────────────────────────────────┐
│                 Managed Agents                  │
│                                                 │
│  ┌──────────────┐   ┌──────────────────────┐   │
│  │    Brain     │   │        Hands          │   │
│  │   (Brain)    │   │       (Hands)         │   │
│  │              │   │                       │   │
│  │  Claude      │   │  Sandbox              │   │
│  │  + Harness   │──▶│  (Container)          │   │
│  │              │   │  Custom Tools         │   │
│  │  Reasoning   │   │  MCP Servers          │   │
│  └──────────────┘   └──────────────────────┘   │
│          │                    │                 │
│          └────────┬───────────┘                 │
│                   │                             │
│          ┌────────▼───────┐                     │
│          │    Session     │                     │
│          │   (Session)    │                     │
│          │                │                     │
│          │ Durable Event  │                     │
│          │      Log       │                     │
│          └────────────────┘                     │
└─────────────────────────────────────────────────┘

Source: Architecture diagram based on Anthropic Engineering: Scaling Managed Agents

Component Role Technical Entity
Brain Reasoning, planning, decision-making Claude + harness logic
Hands Execution, side effects, I/O Containers, tools, MCP servers
Session State persistence Durable event log

The Harness Position Has Changed — Separated Outside the Container, Transparently Connected via execute(name, input)

In the traditional design, Claude, the harness, and the sandbox all coexisted within a single container. In Managed Agents, the harness is moved outside the container and communicates with the sandbox via the execute() interface Source.

The hands interface is kept simple and unified:

execute(name, input) → string

Any implementation of this interface — containers, custom tools, MCP servers, or Anthropic-provided tools — can be treated transparently as "hands" Source.

Context Externalization via Session — Durable Log Without Compression, Enabling Rollback

The session functions as "a context object that exists outside the context window" Source. Via the getEvents() interface, Claude can:

  • Retrieve any slice of the event stream at an arbitrary position
  • Rewind to a specific point in time and reload from there
  • Re-examine relevant events before executing an action

Unlike traditional context compression (an irreversible operation that discards information), the session log retains all information durably. The harness controls how data is fitted into the context window, making future improvements straightforward.

Key to Decoupled Design 3: Shifting the Failure Model from "Pets" to "Cattle"

The Vulnerability of Single-Container Design — Total Session Loss on Failure, Requiring Manual Recovery

In early implementations, all components were colocated in a single container. If the container went down, the entire session was lost and manual recovery of the unresponsive container was required Source.

Fault Tolerance Through Separation — Restart via wake(sessionId), State Restored from Log

By separating components, each element can fail and be replaced independently.

  • The harness becomes stateless → crashes can be recovered with wake(sessionId)
  • After restart, state is restored from the event log via getSession(id)
  • Container re-initialization functions as a standard tool

A system that once required manual "pet" management has transformed into one that can be automatically managed like "cattle" Source.

Key to TTFT Reduction — ~60% at p50, Over 90% at p95

The most concrete quantitative result of separating brain and hands is the reduction in Time to First Token (TTFT) Source.

Percentile Improvement
p50 TTFT ~60% reduction
p95 TTFT Over 90% reduction

Source: Anthropic Engineering: Scaling Managed Agents

The reason for the improvement is straightforward. In the traditional design, reasoning could not begin until container initialization was complete. By separating the harness outside the container, reasoning can now begin immediately without waiting for container provisioning.

Key to Decoupled Design 4: Two Patterns for Prompt Injection Defense — Resource Bundling and Vault

The logical separation of components also plays an important role in security. Anthropic uses two patterns to prevent credential leakage Source.

Pattern 1: Resource-Bundled Authentication — Token Used Only at Initialization, Then Accessed via Git Remote

Taking Git repository access as an example: the repository access token is used at sandbox initialization to clone the repository and is wired as a local git remote. Subsequent git operations from inside the sandbox can be executed without handling the token directly.

Pattern 2: Vault Authentication — Securely Obtain OAuth Tokens via a Dedicated Proxy

Custom tools and OAuth tokens are stored in an external secure vault. A dedicated proxy receives session-related tokens, retrieves credentials from the vault, and handles the processing.

Through structural separation, even if a prompt injection attack succeeds inside the container, it cannot reach the credentials.

Scaling Patterns — Many Brains (Shared Resources) and Many Hands (Multiple Execution Targets)

The design philosophy Anthropic makes explicit is: "Have strong opinions around the interface, but make no assumptions about the number or location of brains and hands" Source.

Many Brains — Multiple Harnesses Share a Common Sandbox, Tools, and Session

Multiple stateless harnesses share common resources. Containers are provisioned only when actually needed, optimizing the cost of parallel execution.

Brain 1 ──┐
Brain 2 ──┼── Shared Sandbox / Tools / Session
Brain N ──┘

Many Hands — A Single Brain Works Across Heterogeneous Execution Environments (Containers / External Services / MCP)

A single brain assigns work across heterogeneous execution environments. Claude reasons about and selects the appropriate execution target.

                      ┌── Container A
Brain (Claude) ──────┼── Container B
                      ├── External Service
                      └── MCP Server

Differences from Traditional Agent Implementations — Managed Agents Wins on 6 Dimensions

The key points of this section are summarized below.

Dimension Traditional Implementation Managed Agents
Harness location Colocated inside container Separated outside container
Behavior on failure Entire session is lost Recoverable from session log
TTFT Delayed by container initialization Reasoning starts immediately
Scaling Manual management, tightly coupled Independent scale-out
Credential management Tends to be mixed into harness Isolated via vault / resource bundling
Model update compatibility Harness assumptions must be revisited Minimal impact thanks to stable interface

Practical Application — When to Choose Managed Agents

4 Use Cases Where Managed Agents Excel — Long-Running / Parallel / High-Security / Model Update Tracking

The key points of this section are summarized below.

  • Long-running, multi-step agent tasks: Session externalization allows tasks spanning hours or days to continue without losing state
  • Large-scale parallel agent execution: The Many Brains pattern enables efficient parallel processing of many jobs
  • Environments with strict security requirements: Structural separation of credentials serves as a defense against prompt injection
  • Cases where model updates need to be tracked: Loose coupling between the harness and the model minimizes harness changes when the model is updated

Cases That Warrant Consideration — Short Tasks / Strong Dependency on Existing Harness

The key points of this section are summarized below.

  • Short-duration, simple tasks: The architectural complexity may introduce overhead
  • Cases with strong dependency on an existing custom harness: Adaptation work to fit the execute() interface will be required

Sources (Primary Information)

The primary sources directly referenced in writing this article are listed below. Always verify the latest accurate information at each link.

参考になったら ♡
Clauder Navi 編集部
@clauder_navi

Anthropic の Claude / Claude Code を中心に、日本のエンジニア向けに最新動向と実務 を毎日発信。 運営方針 は メディアについて をご覧ください。