What Is Claude Sonnet 4.6 | Specs, Capabilities, and Differences from Opus

Article Summary by AI Chatpowered by Claude
結論powered by Claude

Claude Sonnet 4.6 is Anthropic's core model, defined as "the best combination of speed and intelligence." The model ID is claude-sonnet-4-6, the context window is 1M tokens (approximately 3.4 million Unicode characters), and the maximum output is 64K tokens (up to 300K with the Message Batches API and a beta header). It supports both Extended Thinking and Adaptive Thinking.

API pricing is $3/MTok for input and $15/MTok for output — 60% of the cost of the higher-tier Opus 4.8 ($5/$25). It is available on four platforms: the Claude API, AWS Bedrock, Google Vertex AI, and Microsoft Foundry, making it the go-to model for production workloads.

Among the current standard three models (Opus 4.8 / Sonnet 4.6 / Haiku 4.5), Sonnet 4.6 is the only one that supports both Extended Thinking and Adaptive Thinking simultaneously.

目次 (8)

What Is Claude Sonnet 4.6 — Official Position and Model ID

Anthropic's official documentation defines Claude Sonnet 4.6 as "the best combination of speed and intelligence." While the Opus series handles complex reasoning and advanced autonomous coding, Sonnet is positioned to process production workloads quickly and with high precision.

The model ID to use in the API is claude-sonnet-4-6. Starting with the Claude 4.6 generation, alias-style IDs without date suffixes have been adopted, representing a fixed version snapshot. This is not a pointer that automatically switches to the next-generation version, so you can maintain consistent behavior without changing the model ID.

Source: Anthropic Models overview

Key Specs at a Glance

Below is a summary of the Sonnet 4.6 specifications listed in the official documentation.

Item Claude Sonnet 4.6
Model ID (Claude API) claude-sonnet-4-6
Context Window 1M tokens (approx. 750,000 words / approx. 3.4 million Unicode characters)
Max Output (standard) 64,000 tokens
Max Output (Batch API + beta header) 300,000 tokens
Reliable Knowledge Cutoff August 2025
Training Data Cutoff January 2026
Response Speed Fast
Extended Thinking Supported
Adaptive Thinking Supported
Priority Tier Supported

The 1M token context window equates to approximately 3.4 million Unicode characters. For Japanese text, where one character is roughly 1–1.5 tokens, this means you can practically process around 670,000 to 1,000,000 Japanese characters in a single prompt. Given that a typical paperback book runs about 100,000–150,000 characters, you can fit the equivalent of 7–10 novels into a single context.

The Difference Between Extended Thinking and Adaptive Thinking

Sonnet 4.6 supports two "thinking modes." Both relate to reasoning accuracy, but they work differently.

Extended Thinking is a mode in which the model explicitly outputs its reasoning process before generating an answer. It is activated via the API by enabling the thinking parameter. It is effective in situations that require transparency in the solution process — such as mathematical proofs, code debugging, and multi-step logical reasoning.

Adaptive Thinking is a feature that automatically adjusts the amount of internal deliberation based on the difficulty of the problem. It answers simple questions quickly, and automatically takes more time to think deeply about complex problems. No explicit activation is required from the user; it operates in an always-on state.

The feature support status for the current standard three models is as follows:

Model Extended Thinking Adaptive Thinking
Opus 4.8 No Yes
Sonnet 4.6 Yes Yes
Haiku 4.5 Yes No

As the table shows, Sonnet 4.6 is the only standard model that supports both features simultaneously. If you need Extended Thinking but want to keep costs lower than Opus 4.8, Sonnet 4.6 is the clear choice.

What Changes with a 1M Token Context Window

Previous generations up to Sonnet 4.5 had a context window of 200K tokens. Sonnet 4.6's 1M tokens is 5 times the size of the previous generation.

Practical scenarios where this expansion makes a direct difference include:

  1. Passing a large codebase (the combined total of multiple files) in a single prompt for bug investigation or specification review
  2. Including hundreds of pages of PDFs, legal documents, or technical specifications all at once in the context
  3. Continuing follow-up questions while retaining a long conversation history
  4. Testing implementations that include large volumes of documents directly in the context without using RAG
  5. Passing diffs from multiple files simultaneously for comprehensive code review

Note that Sonnet 4.6's 1M token support is separate from the beta header context-1m-2025-08-07 for Sonnet 4.5, which was deprecated on April 30, 2026. With Sonnet 4.6, 1M tokens is available as a standard feature without needing a beta header.

Comparison with Opus 4.8 and Haiku 4.5

Here is a side-by-side comparison of the three models' specs.

Comparison Item Opus 4.8 Sonnet 4.6 Haiku 4.5
Recommended Use Complex reasoning, advanced autonomous tasks General production workloads High-speed processing, cost-focused
Context Window 1M tokens 1M tokens 200K tokens
Max Output (standard) 128K tokens 64K tokens 64K tokens
Extended Thinking No Yes Yes
Adaptive Thinking Yes Yes No
Response Speed Moderate Fast Fastest
API Pricing (input) $5/MTok $3/MTok $1/MTok
API Pricing (output) $25/MTok $15/MTok $5/MTok

Sonnet 4.6's API pricing is 60% of Opus 4.8's. However, Opus 4.8's maximum output is 128K tokens (twice that of Sonnet 4.6). Opus 4.8 has an advantage for generating extremely long text in a single response or for complex tasks requiring frontier-level accuracy.

Haiku 4.5 is even cheaper and faster, but its context window is limited to 200K tokens and it does not support Adaptive Thinking. It is suited for simple classification, summarization, and short conversations where speed and cost are the top priorities.

Source: Anthropic Models overview

Batch API and the 300K Output Beta

With the standard Messages API, Sonnet 4.6's maximum output is 64K tokens. However, by specifying the beta header output-300k-2026-03-24 in the Message Batches API, you can achieve a maximum output of 300K tokens. This is useful for batch-processing large volumes of long reports or detailed code generation.

The Batch API also offers a pricing discount — 50% off the standard rate (input $1.5/MTok, output $7.5/MTok). For large-scale processing tasks where an immediate response is not required, the Batch API is the most cost-effective option.

Additionally, combining with prompt caching can reduce costs further. When a cache read hits, the price drops to $0.30/MTok (10% of the standard rate). Batch API and prompt caching discounts stack, so batch processing that repeatedly sends the same system prompt can result in a substantial reduction in effective costs.

Available Platforms

Sonnet 4.6 is available on the following four platforms.

Platform Model ID
Claude API claude-sonnet-4-6
AWS Bedrock anthropic.claude-sonnet-4-6
Google Vertex AI claude-sonnet-4-6
Microsoft Foundry claude-sonnet-4-6

On AWS Bedrock, you can choose between a global endpoint (dynamic routing for maximum availability) and a regional endpoint (guaranteed data routing to a specific region). If compliance requirements restrict where your data can travel, choose the regional endpoint.

On Google Vertex AI, three options are available: global, multi-region, and regional. For details on using it with Vertex AI, see "Using Claude 4.6 on Vertex AI."

On Microsoft Foundry, the same model ID as the Claude API is used. Note, however, that on Foundry, Opus 4.8's context window is limited to 200K tokens (no specific restriction is documented for Sonnet 4.6).

When to Choose Sonnet 4.6 and Estimated Pricing

Based on Anthropic's recommendations, here is a guide for choosing the right model.

  1. When Sonnet 4.6 is the best fit: Integrating into a production API, large-scale processing, tasks where balancing speed and accuracy matters, situations where you need a 1M token context and also want Extended Thinking
  2. When to choose Opus 4.8: Complex reasoning requiring frontier-level accuracy, generating text exceeding 128K tokens, advanced autonomous tasks
  3. When to choose Haiku 4.5: Simple classification, summarization, or short conversations; top priority on speed or cost; tasks that fit within a 200K context

The base API pricing is $3/MTok for input and $15/MTok for output (USD, excluding tax). It is also accessible through Claude.ai subscriptions (Pro at $20/month, Max from $100/month, Team at $25/seat/month). For pricing details and a comparison of each plan, see "How Much Does Claude Sonnet Cost."

参考になったら ♡
Clauder Navi 編集部
@clauder_navi

Anthropic の Claude / Claude Code を中心に、日本のエンジニア向けに最新動向と実務 を毎日発信。 運営方針 は メディアについて をご覧ください。