Bedrock Claude Pricing | Per-Token Rates and Cost Reduction for Opus, Sonnet, and Haiku

Claude Amazon Bedrock AWS 料金コスト最適化 API

Clauder Navi 編集部 / 最終更新 2026-05-29

AI-generated article summarypowered by Claude

結論powered by Claude

Claude pricing via Bedrock is now on par with direct Anthropic contracts for the latest generation of models. The base rates are: Sonnet 4.6 at $3 input / $15 output, Haiku 4.5 at $1 input / $5 output, and Opus 4.7 at $5 input / $25 output (all per 1M tokens). The older Claude 3.5 Sonnet is priced at $6 / $30 on Bedrock, meaning migrating to the latest models alone can cut costs in half.

Starting with Claude Sonnet 4.5 / Haiku 4.5 / Opus 4.5, Bedrock offers two endpoint types: a global endpoint and a regional endpoint — with regional priced 10% higher than global. If you have no data residency requirements, choosing the global endpoint is the default recommendation and saves roughly 10% compared to locking into the Tokyo region.

Prompt Caching (cache write at 1.25x, cache read at 0.1x) and the Batch API (50% discount on both input and output) are the two most effective cost-reduction tools available on Bedrock. For applications with long conversation histories or large system prompts, Prompt Caching is the go-to option; Batch API is ideal for overnight jobs. Using each where appropriate can dramatically lower your costs.

目次 (12)

The Basic Structure of Bedrock Claude Pricing — Four Layers Determine Your Rate
Per-Token Rates Differ Significantly Between Old and New Generations
Per-Model Rate Reference — Opus 4.x / Sonnet 4.6 / Haiku 4.5
Sonnet 4.6 as the Production Standard, Haiku 4.5 for Batch Workloads
Global vs. Regional Endpoints — Do Not Overlook the 10% Rate Difference
The 10% Regional Premium Applies to Both Input and Output
Prompt Caching — 1.25x for Writes, 0.1x for Reads
When to Use Automatic vs. Explicit Caching
Batch API — 50% Off on Both Input and Output
Differences from Claude Platform on AWS — A New CCU-Based Billing Model
Cost Estimation Example — Approximately $37 for 10,000 Customer Support Tickets
Four Practical Rules for Minimizing Bedrock Claude Costs

The Basic Structure of Bedrock Claude Pricing — Four Layers Determine Your Rate

Claude pricing on Amazon Bedrock is determined by four layers: (1) the per-token input rate by model, (2) the per-token output rate by model, (3) the endpoint type (global vs. regional), and (4) discounts from Prompt Caching and Batch processing. Your AWS bill will show a single line for "Bedrock — Anthropic Claude," but without breaking down the components, it is hard to identify where savings are possible.

The billing unit is 1M tokens (one million tokens), which corresponds to roughly 750,000 words in English or hundreds of thousands of characters in mixed Japanese text. Processing a single short query costs just a few cents, but an agent workflow that feeds in long documents every time can run into thousands of dollars per month — understanding the structure upfront prevents unpleasant surprises on your invoice.

Source: Amazon Bedrock Pricing (accessed: 2026-05-29), Anthropic Official Pricing (accessed: 2026-05-29)

Per-Token Rates Differ Significantly Between Old and New Generations

Claude 3.5 Sonnet is still available on Bedrock, but its rate — $6 input / $30 output per 1M tokens — is twice the cost of the latest Sonnet 4.6 ($3 / $15). Production systems still running on the 3.5 series can cut costs in half without any quality degradation simply by switching to Sonnet 4.6 or Haiku 4.5.

The tokenizer has also changed between generations, and it has been reported that models from Opus 4.7 onward tend to produce more tokens for the same text than older models. Even if the per-token rate is lower, total cost may not decrease as expected depending on output length, so it is recommended to benchmark real usage before migrating.

Per-Model Rate Reference — Opus 4.x / Sonnet 4.6 / Haiku 4.5

The key figures for this section are summarized below.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context	Primary Use Case
Claude Opus 4.7	$5	$25	1M tokens	Complex reasoning, long-form coding
Claude Opus 4.6	$5	$25	1M tokens	High-difficulty reasoning
Claude Sonnet 4.6	$3	$15	1M tokens	Standard production workloads
Claude Sonnet 4.5	$3	$15	200k tokens	Cost-efficiency focused
Claude Haiku 4.5	$1	$5	200k tokens	High-volume, low-latency, classification
Claude Haiku 3.5 (legacy)	$0.80	$4	200k tokens	Cost-first priority
Claude Opus 4.1 (legacy)	$15	$75	200k tokens	Previous generation, high rate
Claude 3.5 Sonnet (old gen)	$6	$30	200k tokens	Legacy Bedrock pricing tier

Source: Anthropic Official Pricing (accessed: 2026-05-29), Claude Models Overview (accessed: 2026-05-29)

Note (as of 2026-05-29): This table was created before Opus 4.8 (with 1M context support) became available on Bedrock. Please verify the Bedrock availability and pricing of the latest Opus 4.8 at Amazon Bedrock Pricing.

The latest 4.x series models available via Bedrock are priced at rates nearly identical to a direct Anthropic API contract. Opus 4.1 at $15 / $75 is priced at 3x the rate of Opus 4.6 / 4.7 and is considered legacy. For new development using the Opus series, choosing 4.6 or later fully captures the available pricing benefits.

Sonnet 4.6 as the Production Standard, Haiku 4.5 for Batch Workloads

Sonnet 4.6 at $3 input / $15 output sits in the middle: one-fifth the cost of Opus and three times the cost of Haiku. For production workloads such as code generation, summarization, and RAG responses, using Sonnet 4.6 as the baseline and selecting Opus or Haiku as needed is the most cost-efficient architecture.

Haiku 4.5 at $1 input / $5 output is one-third the price of Sonnet and is well-suited for classification, extraction, short response generation, and high-volume batch processing. According to official estimates, processing 10,000 customer support tickets with Haiku 4.5 costs approximately $37, meaning email triage and FAQ first-response systems can be handled almost entirely with Haiku.

Global vs. Regional Endpoints — Do Not Overlook the 10% Rate Difference

For models from Claude Sonnet 4.5 / Haiku 4.5 / Opus 4.5 onward, Bedrock now offers two endpoint types. The global endpoint is the standard option that dynamically routes across AWS regions worldwide for maximum availability. The regional endpoint keeps inference data within a specific geographic region, but is 10% more expensive than global.

First confirm whether locking to the Tokyo region is actually required. If your data residency requirements do not mandate staying within Japan, simply choosing the global endpoint saves roughly 10%. On the other hand, if you are in finance, healthcare, or the public sector and must contractually guarantee routing to a specific region, the 10% regional premium is a reasonable insurance cost.

The 10% Regional Premium Applies to Both Input and Output

The 10% premium applies to all token types: input tokens, output tokens, cache writes, and cache reads. The Sonnet 4.6 regional endpoint effectively works out to $3.30 input / $16.50 output. Estimating the impact based on your monthly token consumption will keep your AWS invoice aligned with expectations.

Older-generation models (before Sonnet 4 / Opus 4) do not have a global/regional distinction and are still billed at a single unified rate. In mixed environments, maintaining a ledger of which model uses which endpoint is essential — otherwise monthly unit-cost analysis will quickly become unmanageable.

Source: Anthropic Pricing — Cloud platform pricing (accessed: 2026-05-29)

Prompt Caching — 1.25x for Writes, 0.1x for Reads

Prompt Caching is a mechanism that significantly reduces input costs when reusing long system prompts or conversation histories. It is available via Bedrock as well, with the following multipliers applied to the base input rate.

Operation	Multiplier vs. Base Input Rate	Valid Duration
5-minute cache write	1.25x	5 minutes
1-hour cache write	2x	1 hour
Cache read (hit)	0.1x	Same as write duration

For Sonnet 4.6, a 5-minute cache write costs $3.75 per 1M tokens for input, while a cache read costs $0.30 per 1M tokens. Even a single cache read makes a 5-minute cache write pay for itself, and just two reads make the 1-hour cache profitable. For applications that frequently reuse RAG reference documents, long instruction prompts, or conversation histories, enabling Prompt Caching is a must.

When to Use Automatic vs. Explicit Caching

There are two ways to configure caching.

Automatic caching: Add a single cache_control at the top level of the request and let the system manage cache boundaries automatically. This is sufficient for most use cases.
Explicit cache boundaries: Attach cache_control to individual content blocks for fine-grained control over what gets cached. Use this when you need to separate a long system prompt from frequently changing user input.

Prompt Caching multipliers are compounded with Batch API discounts and data residency multipliers. Combining Batch API with Prompt Caching can compress costs to just a few percent of the base rate in some scenarios.

Source: Anthropic Pricing — Prompt caching (accessed: 2026-05-29)

Batch API — 50% Off on Both Input and Output

The Batch API processes requests asynchronously in bulk and applies a 50% discount to both input and output. Equivalent discounts are available on Bedrock, so any processing that does not require an immediate response — such as overnight jobs or bulk data preprocessing — can have its cost cut in half simply by routing through Batch.

Model	Batch Input	Batch Output
Claude Opus 4.7	$2.50	$12.50
Claude Sonnet 4.6	$1.50	$7.50
Claude Haiku 4.5	$0.50	$2.50

(per 1M tokens; source: Anthropic Pricing — Batch processing, accessed: 2026-05-29)

Using Haiku 4.5 with Batch enables extremely low-cost bulk processing at $0.50 input / $2.50 output per 1M tokens. Routing non-real-time workloads — such as 100,000-email classification, summary generation for search indexes, or log structuring — to Batch is a core cost strategy.

Note, however, that the Batch API can take up to 24 hours to return responses, so it must be used in a design that combines it with the synchronous API. It cannot be used alongside Fast mode, so the practical approach is to route low-latency-required processing through the synchronous API and everything else through Batch.

Differences from Claude Platform on AWS — A New CCU-Based Billing Model

Separately from Bedrock, Claude Platform on AWS, provided directly by Anthropic, is also available via AWS Marketplace. This service bills using a proprietary unit called CCU (Claude Consumption Units) at a fixed rate of $0.01 per CCU. It works by converting your token consumption to USD at Anthropic direct-contract rates, translating that to CCUs, and metering them to AWS Marketplace on an hourly basis.

Item	Amazon Bedrock	Claude Platform on AWS
Billing unit	Tokens (per 1M)	CCU ($0.01/CCU fixed)
Rate basis	Bedrock-specific table	Equivalent to Anthropic direct contract
Invoice appearance	Per-model line items	Single CCU line item
Available features	Bedrock features (Guardrails, Knowledge Bases, etc.)	Anthropic-native features (Memory, Files, Web Search)
Private offers	Bedrock contract	Claude Platform contract (individual)

Source: Anthropic Pricing — Claude Platform on AWS pricing (accessed: 2026-05-29)

The general rule is: organizations that need to keep data under AWS control use Bedrock; organizations that want Anthropic-native features (Memory, Files API, Web Search, etc.) use Claude Platform on AWS. For organizations with existing Bedrock private offers that want to also use Claude Platform on AWS, retroactive application of discounts is not possible, so contract timing should be coordinated with your account representative.

Cost Estimation Example — Approximately $37 for 10,000 Customer Support Tickets

To get a concrete sense of costs, here is an example from the official documentation.

10,000 customer support tickets
Average of ~3,700 tokens per ticket
Claude Haiku 4.5 ($1 input / $5 output per 1M tokens)
Total: approximately $37 (per 10,000 tickets)

That works out to roughly $0.004 per ticket. Running the same processing with Sonnet 4.6 would cost about 3x more, and with Opus 4.7 about 5x more — though depending on quality requirements, the trade-off can still be worthwhile.

When using Claude Managed Agents, a session runtime charge of $0.08 per session-hour is added on top of token costs. Only time spent in the running state is billed; idle or rescheduling periods are not charged. For workloads that require agents to be continuously active, session runtime can become the dominant component of cumulative cost — so designing in aggressive idling is essential.

Four Practical Rules for Minimizing Bedrock Claude Costs

Finally, here are four principles for effectively reducing Claude costs on Bedrock.

Segment model tiers by use case — Use Haiku 4.5 for classification and extraction, Sonnet 4.6 for production workloads, and Opus only for high-difficulty reasoning. Trying to do everything with a single model is a guaranteed way to overspend.
Always enable Prompt Caching — Caching system prompts and conversation histories pays for itself with even a single cache hit. The impact is especially significant for RAG systems, chatbots, and coding assistants.
Route overnight processing to Batch API — Any processing that doesn't need an immediate response gets a 50% discount through Batch. At the scale of 100,000 items, this translates to thousands of dollars in monthly savings.
Default to the global endpoint — If you have no data residency requirements, this saves 10%. Use regional only when contractually required.

Combining all of these — for example, a setup of "Sonnet 4.6 + Prompt Caching + global endpoint + overnight Batch" — can reduce total cost to less than one-quarter of running the 3.5 series on regional alone. If your bill is unexpectedly high, start by checking which of these four levers you are missing.

Source: Amazon Bedrock Pricing (accessed: 2026-05-29), Anthropic Official Pricing (accessed: 2026-05-29), Claude Models Overview (accessed: 2026-05-29)

参考になったら ♡

この記事は役立ちましたか?

ご注意: Clauder Navi は Anthropic 公式情報を直接参照し正確な内容に努めておりますが、本記事の内容に基づく投資判断・契約・利用結果による損害について責任を負いかねます。重要な意思決定の際は、必ず Anthropic 公式・ claude.com の一次情報をご自身でご確認ください。

Clauder Navi 編集部

@clauder_navi

Anthropic の Claude / Claude Code を中心に、日本のエンジニア向けに最新動向と実務を毎日発信。運営方針はメディアについてをご覧ください。

プロフィール → 副社長コラム → レッスン一覧 →