Claude Managed Agents — Keys to Reducing TTFT and Decoupled Design
For developers exhausted by rewriting their harness every time a model is updated, this article breaks down the decoupled design of Managed Agents as published by Anthropic. We cover — in an order that directly informs implementation decisions — how to carve out the brain (reasoning), hands (execution environment), and session (state) into independent abstraction layers; the mechanism that cuts TTFT by p50 60% / p95 90%; and the thinking behind Many Brains / Many Hands scaling.
Managed Agents adopts a 3-layer separation of brain / hands / session. By applying the virtualization principles of OS design to AI agents and moving the harness outside the container, it minimizes the cost of rewriting the harness with every model update.
The core of the implementation is transparent connectivity via the execute(name, input) interface. Containers, custom tools, and MCP servers can all be treated as "hands" through the same function call, achieving p50 60% / p95 90% TTFT reduction while enabling horizontal scaling in both directions.
The key caveat is that sessions must be externalized as durable event logs. While this ensures state is not lost even when a container fails, the harness must guarantee log persistence and rollback design — meaning systems built on a stateless assumption will need to be redesigned.
目次 (22)
- Key to Decoupled Design 1: Why Separate the Execution Environment from Reasoning?
- The Problem: Harnesses Were Designed to "Compensate for Model Shortcomings"
- The Solution — Apply OS Virtualization Principles Across 3 Components
- Key to Decoupled Design 2: The Architecture Is 3 Layers — Brain / Hands / Session
- Separating the 3 Components — Brain (Reasoning) / Hands (Execution) / Session (State)
- The Harness Position Has Changed — Separated Outside the Container, Transparently Connected via execute(name, input)
- Context Externalization via Session — Durable Log Without Compression, Enabling Rollback
- Key to Decoupled Design 3: Shifting the Failure Model from "Pets" to "Cattle"
- The Vulnerability of Single-Container Design — Total Session Loss on Failure, Requiring Manual Recovery
- Fault Tolerance Through Separation — Restart via wake(sessionId), State Restored from Log
- Key to TTFT Reduction — ~60% at p50, Over 90% at p95
- Key to Decoupled Design 4: Two Patterns for Prompt Injection Defense — Resource Bundling and Vault
- Pattern 1: Resource-Bundled Authentication — Token Used Only at Initialization, Then Accessed via Git Remote
- Pattern 2: Vault Authentication — Securely Obtain OAuth Tokens via a Dedicated Proxy
- Scaling Patterns — Many Brains (Shared Resources) and Many Hands (Multiple Execution Targets)
- Many Brains — Multiple Harnesses Share a Common Sandbox, Tools, and Session
- Many Hands — A Single Brain Works Across Heterogeneous Execution Environments (Containers / External Services / MCP)
- Differences from Traditional Agent Implementations — Managed Agents Wins on 6 Dimensions
- Practical Application — When to Choose Managed Agents
- 4 Use Cases Where Managed Agents Excel — Long-Running / Parallel / High-Security / Model Update Tracking
- Cases That Warrant Consideration — Short Tasks / Strong Dependency on Existing Harness
- Sources (Primary Information)
Key to Decoupled Design 1: Why Separate the Execution Environment from Reasoning?
The Problem: Harnesses Were Designed to "Compensate for Model Shortcomings"
When building agents, developers must design not only Claude itself but also the surrounding harness (the entire execution control framework). The harness includes prompt control, context management, tool invocation, retry logic, and more.
Anthropic has identified a fundamental problem with this approach:
"Harnesses embed assumptions about what the model cannot do. But as models improve, those assumptions become outdated." Source
As a concrete example, Claude Sonnet 4.5 exhibited "context anxiety" near token limits, requiring workarounds implemented in the harness. However, Claude Opus 4.5 resolved this issue, making those harness workarounds unnecessary Source.
The more model assumptions are baked into the harness, the more the entire harness must be revisited with each model improvement. This is unsustainable from the perspectives of scalability, safety, and cost.
The Solution — Apply OS Virtualization Principles Across 3 Components
The solution Anthropic adopted is the virtualization principle proven in OS design. Just as an OS separates processes from hardware, Managed Agents virtualize 3 components as independent, stable abstraction layers Source.
Key to Decoupled Design 2: The Architecture Is 3 Layers — Brain / Hands / Session
Separating the 3 Components — Brain (Reasoning) / Hands (Execution) / Session (State)
The key points of this section are summarized below.
┌─────────────────────────────────────────────────┐
│ Managed Agents │
│ │
│ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Brain │ │ Hands │ │
│ │ (Brain) │ │ (Hands) │ │
│ │ │ │ │ │
│ │ Claude │ │ Sandbox │ │
│ │ + Harness │──▶│ (Container) │ │
│ │ │ │ Custom Tools │ │
│ │ Reasoning │ │ MCP Servers │ │
│ └──────────────┘ └──────────────────────┘ │
│ │ │ │
│ └────────┬───────────┘ │
│ │ │
│ ┌────────▼───────┐ │
│ │ Session │ │
│ │ (Session) │ │
│ │ │ │
│ │ Durable Event │ │
│ │ Log │ │
│ └────────────────┘ │
└─────────────────────────────────────────────────┘
Source: Architecture diagram based on Anthropic Engineering: Scaling Managed Agents
| Component | Role | Technical Entity |
|---|---|---|
| Brain | Reasoning, planning, decision-making | Claude + harness logic |
| Hands | Execution, side effects, I/O | Containers, tools, MCP servers |
| Session | State persistence | Durable event log |
The Harness Position Has Changed — Separated Outside the Container, Transparently Connected via execute(name, input)
In the traditional design, Claude, the harness, and the sandbox all coexisted within a single container. In Managed Agents, the harness is moved outside the container and communicates with the sandbox via the execute() interface Source.
The hands interface is kept simple and unified:
execute(name, input) → string
Any implementation of this interface — containers, custom tools, MCP servers, or Anthropic-provided tools — can be treated transparently as "hands" Source.
Context Externalization via Session — Durable Log Without Compression, Enabling Rollback
The session functions as "a context object that exists outside the context window" Source. Via the getEvents() interface, Claude can:
- Retrieve any slice of the event stream at an arbitrary position
- Rewind to a specific point in time and reload from there
- Re-examine relevant events before executing an action
Unlike traditional context compression (an irreversible operation that discards information), the session log retains all information durably. The harness controls how data is fitted into the context window, making future improvements straightforward.
Key to Decoupled Design 3: Shifting the Failure Model from "Pets" to "Cattle"
The Vulnerability of Single-Container Design — Total Session Loss on Failure, Requiring Manual Recovery
In early implementations, all components were colocated in a single container. If the container went down, the entire session was lost and manual recovery of the unresponsive container was required Source.
Fault Tolerance Through Separation — Restart via wake(sessionId), State Restored from Log
By separating components, each element can fail and be replaced independently.
- The harness becomes stateless → crashes can be recovered with
wake(sessionId) - After restart, state is restored from the event log via
getSession(id) - Container re-initialization functions as a standard tool
A system that once required manual "pet" management has transformed into one that can be automatically managed like "cattle" Source.
Key to TTFT Reduction — ~60% at p50, Over 90% at p95
The most concrete quantitative result of separating brain and hands is the reduction in Time to First Token (TTFT) Source.
| Percentile | Improvement |
|---|---|
| p50 TTFT | ~60% reduction |
| p95 TTFT | Over 90% reduction |
Source: Anthropic Engineering: Scaling Managed Agents
The reason for the improvement is straightforward. In the traditional design, reasoning could not begin until container initialization was complete. By separating the harness outside the container, reasoning can now begin immediately without waiting for container provisioning.
Key to Decoupled Design 4: Two Patterns for Prompt Injection Defense — Resource Bundling and Vault
The logical separation of components also plays an important role in security. Anthropic uses two patterns to prevent credential leakage Source.
Pattern 1: Resource-Bundled Authentication — Token Used Only at Initialization, Then Accessed via Git Remote
Taking Git repository access as an example: the repository access token is used at sandbox initialization to clone the repository and is wired as a local git remote. Subsequent git operations from inside the sandbox can be executed without handling the token directly.
Pattern 2: Vault Authentication — Securely Obtain OAuth Tokens via a Dedicated Proxy
Custom tools and OAuth tokens are stored in an external secure vault. A dedicated proxy receives session-related tokens, retrieves credentials from the vault, and handles the processing.
Through structural separation, even if a prompt injection attack succeeds inside the container, it cannot reach the credentials.
Scaling Patterns — Many Brains (Shared Resources) and Many Hands (Multiple Execution Targets)
The design philosophy Anthropic makes explicit is: "Have strong opinions around the interface, but make no assumptions about the number or location of brains and hands" Source.
Many Brains — Multiple Harnesses Share a Common Sandbox, Tools, and Session
Multiple stateless harnesses share common resources. Containers are provisioned only when actually needed, optimizing the cost of parallel execution.
Brain 1 ──┐
Brain 2 ──┼── Shared Sandbox / Tools / Session
Brain N ──┘
Many Hands — A Single Brain Works Across Heterogeneous Execution Environments (Containers / External Services / MCP)
A single brain assigns work across heterogeneous execution environments. Claude reasons about and selects the appropriate execution target.
┌── Container A
Brain (Claude) ──────┼── Container B
├── External Service
└── MCP Server
Differences from Traditional Agent Implementations — Managed Agents Wins on 6 Dimensions
The key points of this section are summarized below.
| Dimension | Traditional Implementation | Managed Agents |
|---|---|---|
| Harness location | Colocated inside container | Separated outside container |
| Behavior on failure | Entire session is lost | Recoverable from session log |
| TTFT | Delayed by container initialization | Reasoning starts immediately |
| Scaling | Manual management, tightly coupled | Independent scale-out |
| Credential management | Tends to be mixed into harness | Isolated via vault / resource bundling |
| Model update compatibility | Harness assumptions must be revisited | Minimal impact thanks to stable interface |
Practical Application — When to Choose Managed Agents
4 Use Cases Where Managed Agents Excel — Long-Running / Parallel / High-Security / Model Update Tracking
The key points of this section are summarized below.
- Long-running, multi-step agent tasks: Session externalization allows tasks spanning hours or days to continue without losing state
- Large-scale parallel agent execution: The Many Brains pattern enables efficient parallel processing of many jobs
- Environments with strict security requirements: Structural separation of credentials serves as a defense against prompt injection
- Cases where model updates need to be tracked: Loose coupling between the harness and the model minimizes harness changes when the model is updated
Cases That Warrant Consideration — Short Tasks / Strong Dependency on Existing Harness
The key points of this section are summarized below.
- Short-duration, simple tasks: The architectural complexity may introduce overhead
- Cases with strong dependency on an existing custom harness: Adaptation work to fit the
execute()interface will be required
Sources (Primary Information)
The primary sources directly referenced in writing this article are listed below. Always verify the latest accurate information at each link.
- Anthropic Engineering: Scaling Managed Agents — Decoupling the Brain from the Hands — Authors: Lance Martin, Gabe Cemaj, Michael Cohen
- Source — Official documentation (Managed Agents-related pages published on a rolling basis)