Vertex AI Claude Pricing | Opus/Sonnet Rates & Bedrock Comparison

"How much does it cost to use Claude on Vertex AI?" and "Can I consolidate billing through Google Cloud instead of contracting directly with Anthropic?" — these two questions sit at the heart of what people search for when they look up "vertex claude pricing."

This article draws on Anthropic's official pricing documentation and Google Cloud Vertex AI's official information to summarize per-model rates, the 10% premium for regional endpoints, and the pricing structure for Batch API and Prompt Caching.

Article Summary by AI Chatpowered by Claude
結論powered by Claude

Claude pricing on Vertex AI uses the same base rates as the Anthropic direct API. Claude Opus 4.8 is priced at $5 input / $25 output (per million tokens), Sonnet 4.6 at $3 input / $15 output, and Haiku 4.5 at $1 input / $5 output — the same structure applies on Vertex AI.

However, choosing a regional endpoint (regional or multi-region) adds a 10% premium over the global endpoint. The primary cost-reduction strategies are combining Batch API (50% discount) with Prompt Caching (which compresses input costs to as low as 10% of the standard rate on cache hits).

Billing is consolidated into your GCP billing account, so organizations that want to use Claude under their existing Google Cloud budget management — without separately loading credits on the Anthropic side — will find this a suitable option.

目次 (8)

Overview of Vertex AI Claude Pricing — Same Rate Structure as the Anthropic Direct API

As a strategic partner of Google Cloud, Anthropic makes Claude models available through the Vertex AI Model Garden, with per-token rates set at the same level as the Anthropic direct API. The differences lie in the billing channel and the availability of certain options.

When using Claude via Vertex AI, all charges flow into your GCP billing account. There is no need to purchase credits through the Anthropic Console — you can use your existing Google Cloud billing management, budget alerts, and cost allocation tags as-is. Fine-grained access control via IAM, network perimeters via VPC Service Controls, and audit logging via Cloud Logging are all available natively in GCP, making it a practical choice for enterprise deployments with strict compliance requirements.

On the other hand, the Anthropic direct API tends to receive new models first, and Vertex AI releases may lag by weeks or even months. Keep this in mind for projects where development speed is the top priority.

Full Model Pricing Table (as of June 2026)

Below are the base rates for Claude models available on Vertex AI (per 1M tokens, USD). Source: Anthropic Official Pricing Page

Model Input Output Status
Claude Opus 4.8 $5.00 $25.00 Latest Opus
Claude Opus 4.7 $5.00 $25.00 Previous-gen Opus
Claude Sonnet 4.6 $3.00 $15.00 Standard recommended model
Claude Sonnet 4.5 $3.00 $15.00 Previous-gen Sonnet
Claude Haiku 4.5 $1.00 $5.00 Fast & low-cost
Claude Haiku 3.5 $0.80 $4.00 Older generation — still available on Vertex AI

Notably, Haiku 3.5 has been deprecated on the Anthropic direct API (as of 2026 — see Anthropic model deprecation information for details), but remains available on Vertex AI. If you need to keep using Haiku 3.5 in existing batch processing pipelines, routing through Vertex AI is a practical way to continue doing so.

Anthropic maintains a unified pricing policy within the same model family: the Opus family is consistently priced at $5/$25 from version 4.5 through 4.8. Upgrading from 4.7 to 4.8 incurs no additional cost, and there is no pricing difference between Opus generations.

Why Regional Endpoints Cost 10% More

Vertex AI Claude endpoints come in two types: global endpoints and regional endpoints (regional or multi-region). According to Anthropic's official documentation, regional endpoints carry a 10% premium over global pricing.

For example, if the global input rate for Sonnet 4.6 is $3.00, using a regional endpoint such as us-east5 or europe-west1 brings the input rate up to $3.30.

Common reasons to choose a regional endpoint include:

  • Regulatory requirements that mandate data stay within a specific region (e.g., GDPR, healthcare privacy laws)
  • A need to minimize latency by staying close to a particular region
  • Internal policies that require traffic to be routed to a specific region

If cost optimization is your goal, use the global endpoint unless regulatory requirements dictate otherwise.

As of June 2026, Claude-compatible regions include us-east5, europe-west1, and asia-southeast1, but Claude is not yet available in the Tokyo region (asia-northeast1). If low latency within Japan is a priority, AWS Bedrock's ap-northeast-1 (Tokyo) or the Anthropic direct API offers a geographic advantage.

Up to 50% Savings with Batch API

For workloads that don't require immediate responses, Batch API lets you process requests at half the standard price. Typical use cases include bulk document conversion, automated scoring of evaluation sets, and background data processing.

Model Batch Input Batch Output
Claude Opus 4.8 / 4.7 $2.50 / MTok $12.50 / MTok
Claude Sonnet 4.6 / 4.5 $1.50 / MTok $7.50 / MTok
Claude Haiku 4.5 $0.50 / MTok $2.50 / MTok

Batch API processes requests asynchronously, with completion times ranging from minutes to hours. In return, you run the same models at half the cost of real-time calls. Workloads with relaxed SLAs — such as monthly aggregation jobs or nightly report generation — are strong candidates for offloading to Batch API for significant cost benefits.

Prompt Caching: How It Works and What It Costs

For workloads that repeatedly use the same system prompt or lengthy reference materials, Prompt Caching can dramatically reduce input costs. Writing to the cache incurs a higher rate, but repeated cache hits bring total costs down.

Operation Price Multiplier Cache Duration
Cache write (5-minute retention) Base input × 1.25 5 minutes
Cache write (1-hour retention) Base input × 2.0 1 hour
Cache read (hit) Base input × 0.1 Same as write

As a concrete example, for Sonnet 4.6 (input $3.00 / MTok), a 5-minute cache write costs $3.75 / MTok, while a cache hit read costs $0.30 / MTok. You need roughly 4–5 hits on the same cache to recoup the write cost.

Prompt Caching is especially effective for chat applications that send a long system prompt every turn, or RAG pipelines that embed the same document dozens of times.

Pricing Comparison with AWS Bedrock

Here is a summary of the key differences between Vertex AI and AWS Bedrock, the other major cloud platform offering Claude.

Dimension Vertex AI AWS Bedrock
Base rates Same as direct API Same as direct API
Regional premium +10% vs. global Varies by region
Batch discount 50% 50%
Prompt Caching Available Available
Billing integration GCP billing account AWS billing account
Haiku 3.5 availability Yes Yes
Tokyo region Not available (as of June 2026) Available via ap-northeast-1
IAM integration Google Cloud IAM AWS IAM

Bedrock has an edge when it comes to Tokyo region support. On the other hand, projects where most infrastructure already runs on GCP can benefit from consolidating IAM, audit logs, and cost management by staying on Vertex AI. Factor in the cost of switching vendors before making your decision.

Managing Billing and Setting Budget Alerts on GCP

Vertex AI Claude usage costs appear alongside other GCP services in the Cloud Billing dashboard. Setting up budget alerts in advance is important to prevent cost overruns.

  1. Open Google Cloud Console → [Billing] → [Budgets & alerts]
  2. Click [Create budget] and specify "Vertex AI" as the service in the scope
  3. Enter a monthly spending limit and configure email notification recipients for 50%, 90%, and 100% thresholds
  4. Click [Save] to activate the alerts

Applying labels by project or team lets you see exactly which services or departments are driving Claude costs. Enabling billing data export to BigQuery also lets you analyze spending trends and per-model cost breakdowns freely with SQL.

If you're making a high volume of Claude calls, it's worth checking with Google Cloud sales to see whether Committed Use Discounts (CUDs) or Committed Use Contracts might apply.

4 Practical Rules for Cost Optimization

  1. Assign models deliberately by tier — Route relatively simple tasks like summarization, classification, and translation to Haiku 4.5 ($1/$5), and reserve Opus ($5/$25) for complex reasoning and long-form generation only. Using Sonnet ($3/$15) as the middle-ground default and selecting models per use case can substantially cut total costs.

  2. Aggressively offload to Batch API — Identify workloads that don't require real-time responses and run them as nightly or weekly batch jobs via Batch API. The 50% discount is the single most effective cost-reduction lever for any given model.

  3. Use Prompt Caching where it fits — Enable Prompt Caching in applications that repeatedly send the same long system prompts or reference documents. Using 1-hour retention and designing for maximum cache hit rates across conversation turns is an effective approach.

  4. Use regional endpoints only when data residency is required — Stick with the global endpoint when there are no regulatory requirements, and avoid the 10% premium. The global endpoint offers equivalent SLA and reliability — only the cost differs.


For the latest pricing information, refer to the Anthropic Official Pricing Page and the Google Cloud Vertex AI Pricing Page. Prices are subject to change.

参考になったら ♡
Clauder Navi 編集部
@clauder_navi

Anthropic の Claude / Claude Code を中心に、日本のエンジニア向けに最新動向と実務 を毎日発信。 運営方針 は メディアについて をご覧ください。