Bedrock Claude Pricing | Per-Token Rates and Cost Reduction for Opus, Sonnet, and Haiku
Claude pricing via Bedrock is now on par with direct Anthropic contracts for the latest generation of models. The base rates are: Sonnet 4.6 at $3 input / $15 output, Haiku 4.5 at $1 input / $5 output, and Opus 4.7 at $5 input / $25 output (all per 1M tokens). The older Claude 3.5 Sonnet is priced at $6 / $30 on Bedrock, meaning migrating to the latest models alone can cut costs in half.
Starting with Claude Sonnet 4.5 / Haiku 4.5 / Opus 4.5, Bedrock offers two endpoint types: a global endpoint and a regional endpoint — with regional priced 10% higher than global. If you have no data residency requirements, choosing the global endpoint is the default recommendation and saves roughly 10% compared to locking into the Tokyo region.
Prompt Caching (cache write at 1.25x, cache read at 0.1x) and the Batch API (50% discount on both input and output) are the two most effective cost-reduction tools available on Bedrock. For applications with long conversation histories or large system prompts, Prompt Caching is the go-to option; Batch API is ideal for overnight jobs. Using each where appropriate can dramatically lower your costs.
目次 (12)
- The Basic Structure of Bedrock Claude Pricing — Four Layers Determine Your Rate
- Per-Token Rates Differ Significantly Between Old and New Generations
- Per-Model Rate Reference — Opus 4.x / Sonnet 4.6 / Haiku 4.5
- Sonnet 4.6 as the Production Standard, Haiku 4.5 for Batch Workloads
- Global vs. Regional Endpoints — Do Not Overlook the 10% Rate Difference
- The 10% Regional Premium Applies to Both Input and Output
- Prompt Caching — 1.25x for Writes, 0.1x for Reads
- When to Use Automatic vs. Explicit Caching
- Batch API — 50% Off on Both Input and Output
- Differences from Claude Platform on AWS — A New CCU-Based Billing Model
- Cost Estimation Example — Approximately $37 for 10,000 Customer Support Tickets
- Four Practical Rules for Minimizing Bedrock Claude Costs
The Basic Structure of Bedrock Claude Pricing — Four Layers Determine Your Rate
Claude pricing on Amazon Bedrock is determined by four layers: (1) the per-token input rate by model, (2) the per-token output rate by model, (3) the endpoint type (global vs. regional), and (4) discounts from Prompt Caching and Batch processing. Your AWS bill will show a single line for "Bedrock — Anthropic Claude," but without breaking down the components, it is hard to identify where savings are possible.
The billing unit is 1M tokens (one million tokens), which corresponds to roughly 750,000 words in English or hundreds of thousands of characters in mixed Japanese text. Processing a single short query costs just a few cents, but an agent workflow that feeds in long documents every time can run into thousands of dollars per month — understanding the structure upfront prevents unpleasant surprises on your invoice.
Source: Amazon Bedrock Pricing (accessed: 2026-05-29), Anthropic Official Pricing (accessed: 2026-05-29)
Per-Token Rates Differ Significantly Between Old and New Generations
Claude 3.5 Sonnet is still available on Bedrock, but its rate — $6 input / $30 output per 1M tokens — is twice the cost of the latest Sonnet 4.6 ($3 / $15). Production systems still running on the 3.5 series can cut costs in half without any quality degradation simply by switching to Sonnet 4.6 or Haiku 4.5.
The tokenizer has also changed between generations, and it has been reported that models from Opus 4.7 onward tend to produce more tokens for the same text than older models. Even if the per-token rate is lower, total cost may not decrease as expected depending on output length, so it is recommended to benchmark real usage before migrating.
Per-Model Rate Reference — Opus 4.x / Sonnet 4.6 / Haiku 4.5
The key figures for this section are summarized below.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context | Primary Use Case |
|---|---|---|---|---|
| Claude Opus 4.7 | $5 | $25 | 1M tokens | Complex reasoning, long-form coding |
| Claude Opus 4.6 | $5 | $25 | 1M tokens | High-difficulty reasoning |
| Claude Sonnet 4.6 | $3 | $15 | 1M tokens | Standard production workloads |
| Claude Sonnet 4.5 | $3 | $15 | 200k tokens | Cost-efficiency focused |
| Claude Haiku 4.5 | $1 | $5 | 200k tokens | High-volume, low-latency, classification |
| Claude Haiku 3.5 (legacy) | $0.80 | $4 | 200k tokens | Cost-first priority |
| Claude Opus 4.1 (legacy) | $15 | $75 | 200k tokens | Previous generation, high rate |
| Claude 3.5 Sonnet (old gen) | $6 | $30 | 200k tokens | Legacy Bedrock pricing tier |
Source: Anthropic Official Pricing (accessed: 2026-05-29), Claude Models Overview (accessed: 2026-05-29)
Note (as of 2026-05-29): This table was created before Opus 4.8 (with 1M context support) became available on Bedrock. Please verify the Bedrock availability and pricing of the latest Opus 4.8 at Amazon Bedrock Pricing.
The latest 4.x series models available via Bedrock are priced at rates nearly identical to a direct Anthropic API contract. Opus 4.1 at $15 / $75 is priced at 3x the rate of Opus 4.6 / 4.7 and is considered legacy. For new development using the Opus series, choosing 4.6 or later fully captures the available pricing benefits.
Sonnet 4.6 as the Production Standard, Haiku 4.5 for Batch Workloads
Sonnet 4.6 at $3 input / $15 output sits in the middle: one-fifth the cost of Opus and three times the cost of Haiku. For production workloads such as code generation, summarization, and RAG responses, using Sonnet 4.6 as the baseline and selecting Opus or Haiku as needed is the most cost-efficient architecture.
Haiku 4.5 at $1 input / $5 output is one-third the price of Sonnet and is well-suited for classification, extraction, short response generation, and high-volume batch processing. According to official estimates, processing 10,000 customer support tickets with Haiku 4.5 costs approximately $37, meaning email triage and FAQ first-response systems can be handled almost entirely with Haiku.
Global vs. Regional Endpoints — Do Not Overlook the 10% Rate Difference
For models from Claude Sonnet 4.5 / Haiku 4.5 / Opus 4.5 onward, Bedrock now offers two endpoint types. The global endpoint is the standard option that dynamically routes across AWS regions worldwide for maximum availability. The regional endpoint keeps inference data within a specific geographic region, but is 10% more expensive than global.
First confirm whether locking to the Tokyo region is actually required. If your data residency requirements do not mandate staying within Japan, simply choosing the global endpoint saves roughly 10%. On the other hand, if you are in finance, healthcare, or the public sector and must contractually guarantee routing to a specific region, the 10% regional premium is a reasonable insurance cost.
The 10% Regional Premium Applies to Both Input and Output
The 10% premium applies to all token types: input tokens, output tokens, cache writes, and cache reads. The Sonnet 4.6 regional endpoint effectively works out to $3.30 input / $16.50 output. Estimating the impact based on your monthly token consumption will keep your AWS invoice aligned with expectations.
Older-generation models (before Sonnet 4 / Opus 4) do not have a global/regional distinction and are still billed at a single unified rate. In mixed environments, maintaining a ledger of which model uses which endpoint is essential — otherwise monthly unit-cost analysis will quickly become unmanageable.
Source: Anthropic Pricing — Cloud platform pricing (accessed: 2026-05-29)
Prompt Caching — 1.25x for Writes, 0.1x for Reads
Prompt Caching is a mechanism that significantly reduces input costs when reusing long system prompts or conversation histories. It is available via Bedrock as well, with the following multipliers applied to the base input rate.
| Operation | Multiplier vs. Base Input Rate | Valid Duration |
|---|---|---|
| 5-minute cache write | 1.25x | 5 minutes |
| 1-hour cache write | 2x | 1 hour |
| Cache read (hit) | 0.1x | Same as write duration |
For Sonnet 4.6, a 5-minute cache write costs $3.75 per 1M tokens for input, while a cache read costs $0.30 per 1M tokens. Even a single cache read makes a 5-minute cache write pay for itself, and just two reads make the 1-hour cache profitable. For applications that frequently reuse RAG reference documents, long instruction prompts, or conversation histories, enabling Prompt Caching is a must.
When to Use Automatic vs. Explicit Caching
There are two ways to configure caching.
- Automatic caching: Add a single
cache_controlat the top level of the request and let the system manage cache boundaries automatically. This is sufficient for most use cases. - Explicit cache boundaries: Attach
cache_controlto individual content blocks for fine-grained control over what gets cached. Use this when you need to separate a long system prompt from frequently changing user input.
Prompt Caching multipliers are compounded with Batch API discounts and data residency multipliers. Combining Batch API with Prompt Caching can compress costs to just a few percent of the base rate in some scenarios.
Source: Anthropic Pricing — Prompt caching (accessed: 2026-05-29)
Batch API — 50% Off on Both Input and Output
The Batch API processes requests asynchronously in bulk and applies a 50% discount to both input and output. Equivalent discounts are available on Bedrock, so any processing that does not require an immediate response — such as overnight jobs or bulk data preprocessing — can have its cost cut in half simply by routing through Batch.
| Model | Batch Input | Batch Output |
|---|---|---|
| Claude Opus 4.7 | $2.50 | $12.50 |
| Claude Sonnet 4.6 | $1.50 | $7.50 |
| Claude Haiku 4.5 | $0.50 | $2.50 |
(per 1M tokens; source: Anthropic Pricing — Batch processing, accessed: 2026-05-29)
Using Haiku 4.5 with Batch enables extremely low-cost bulk processing at $0.50 input / $2.50 output per 1M tokens. Routing non-real-time workloads — such as 100,000-email classification, summary generation for search indexes, or log structuring — to Batch is a core cost strategy.
Note, however, that the Batch API can take up to 24 hours to return responses, so it must be used in a design that combines it with the synchronous API. It cannot be used alongside Fast mode, so the practical approach is to route low-latency-required processing through the synchronous API and everything else through Batch.
Differences from Claude Platform on AWS — A New CCU-Based Billing Model
Separately from Bedrock, Claude Platform on AWS, provided directly by Anthropic, is also available via AWS Marketplace. This service bills using a proprietary unit called CCU (Claude Consumption Units) at a fixed rate of $0.01 per CCU. It works by converting your token consumption to USD at Anthropic direct-contract rates, translating that to CCUs, and metering them to AWS Marketplace on an hourly basis.
| Item | Amazon Bedrock | Claude Platform on AWS |
|---|---|---|
| Billing unit | Tokens (per 1M) | CCU ($0.01/CCU fixed) |
| Rate basis | Bedrock-specific table | Equivalent to Anthropic direct contract |
| Invoice appearance | Per-model line items | Single CCU line item |
| Available features | Bedrock features (Guardrails, Knowledge Bases, etc.) | Anthropic-native features (Memory, Files, Web Search) |
| Private offers | Bedrock contract | Claude Platform contract (individual) |
Source: Anthropic Pricing — Claude Platform on AWS pricing (accessed: 2026-05-29)
The general rule is: organizations that need to keep data under AWS control use Bedrock; organizations that want Anthropic-native features (Memory, Files API, Web Search, etc.) use Claude Platform on AWS. For organizations with existing Bedrock private offers that want to also use Claude Platform on AWS, retroactive application of discounts is not possible, so contract timing should be coordinated with your account representative.
Cost Estimation Example — Approximately $37 for 10,000 Customer Support Tickets
To get a concrete sense of costs, here is an example from the official documentation.
- 10,000 customer support tickets
- Average of ~3,700 tokens per ticket
- Claude Haiku 4.5 ($1 input / $5 output per 1M tokens)
- Total: approximately $37 (per 10,000 tickets)
That works out to roughly $0.004 per ticket. Running the same processing with Sonnet 4.6 would cost about 3x more, and with Opus 4.7 about 5x more — though depending on quality requirements, the trade-off can still be worthwhile.
When using Claude Managed Agents, a session runtime charge of $0.08 per session-hour is added on top of token costs. Only time spent in the running state is billed; idle or rescheduling periods are not charged. For workloads that require agents to be continuously active, session runtime can become the dominant component of cumulative cost — so designing in aggressive idling is essential.
Four Practical Rules for Minimizing Bedrock Claude Costs
Finally, here are four principles for effectively reducing Claude costs on Bedrock.
- Segment model tiers by use case — Use Haiku 4.5 for classification and extraction, Sonnet 4.6 for production workloads, and Opus only for high-difficulty reasoning. Trying to do everything with a single model is a guaranteed way to overspend.
- Always enable Prompt Caching — Caching system prompts and conversation histories pays for itself with even a single cache hit. The impact is especially significant for RAG systems, chatbots, and coding assistants.
- Route overnight processing to Batch API — Any processing that doesn't need an immediate response gets a 50% discount through Batch. At the scale of 100,000 items, this translates to thousands of dollars in monthly savings.
- Default to the global endpoint — If you have no data residency requirements, this saves 10%. Use regional only when contractually required.
Combining all of these — for example, a setup of "Sonnet 4.6 + Prompt Caching + global endpoint + overnight Batch" — can reduce total cost to less than one-quarter of running the 3.5 series on regional alone. If your bill is unexpectedly high, start by checking which of these four levers you are missing.
Source: Amazon Bedrock Pricing (accessed: 2026-05-29), Anthropic Official Pricing (accessed: 2026-05-29), Claude Models Overview (accessed: 2026-05-29)