Vertex AI Claude Pricing | Opus/Sonnet Rates & Bedrock Comparison
"How much does it cost to use Claude on Vertex AI?" and "Can I consolidate billing through Google Cloud instead of contracting directly with Anthropic?" — these two questions sit at the heart of what people search for when they look up "vertex claude pricing."
This article draws on Anthropic's official pricing documentation and Google Cloud Vertex AI's official information to summarize per-model rates, the 10% premium for regional endpoints, and the pricing structure for Batch API and Prompt Caching.
Claude pricing on Vertex AI uses the same base rates as the Anthropic direct API. Claude Opus 4.8 is priced at $5 input / $25 output (per million tokens), Sonnet 4.6 at $3 input / $15 output, and Haiku 4.5 at $1 input / $5 output — the same structure applies on Vertex AI.
However, choosing a regional endpoint (regional or multi-region) adds a 10% premium over the global endpoint. The primary cost-reduction strategies are combining Batch API (50% discount) with Prompt Caching (which compresses input costs to as low as 10% of the standard rate on cache hits).
Billing is consolidated into your GCP billing account, so organizations that want to use Claude under their existing Google Cloud budget management — without separately loading credits on the Anthropic side — will find this a suitable option.
目次 (8)
- Overview of Vertex AI Claude Pricing — Same Rate Structure as the Anthropic Direct API
- Full Model Pricing Table (as of June 2026)
- Why Regional Endpoints Cost 10% More
- Up to 50% Savings with Batch API
- Prompt Caching: How It Works and What It Costs
- Pricing Comparison with AWS Bedrock
- Managing Billing and Setting Budget Alerts on GCP
- 4 Practical Rules for Cost Optimization
Overview of Vertex AI Claude Pricing — Same Rate Structure as the Anthropic Direct API
As a strategic partner of Google Cloud, Anthropic makes Claude models available through the Vertex AI Model Garden, with per-token rates set at the same level as the Anthropic direct API. The differences lie in the billing channel and the availability of certain options.
When using Claude via Vertex AI, all charges flow into your GCP billing account. There is no need to purchase credits through the Anthropic Console — you can use your existing Google Cloud billing management, budget alerts, and cost allocation tags as-is. Fine-grained access control via IAM, network perimeters via VPC Service Controls, and audit logging via Cloud Logging are all available natively in GCP, making it a practical choice for enterprise deployments with strict compliance requirements.
On the other hand, the Anthropic direct API tends to receive new models first, and Vertex AI releases may lag by weeks or even months. Keep this in mind for projects where development speed is the top priority.
Full Model Pricing Table (as of June 2026)
Below are the base rates for Claude models available on Vertex AI (per 1M tokens, USD). Source: Anthropic Official Pricing Page
| Model | Input | Output | Status |
|---|---|---|---|
| Claude Opus 4.8 | $5.00 | $25.00 | Latest Opus |
| Claude Opus 4.7 | $5.00 | $25.00 | Previous-gen Opus |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Standard recommended model |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Previous-gen Sonnet |
| Claude Haiku 4.5 | $1.00 | $5.00 | Fast & low-cost |
| Claude Haiku 3.5 | $0.80 | $4.00 | Older generation — still available on Vertex AI |
Notably, Haiku 3.5 has been deprecated on the Anthropic direct API (as of 2026 — see Anthropic model deprecation information for details), but remains available on Vertex AI. If you need to keep using Haiku 3.5 in existing batch processing pipelines, routing through Vertex AI is a practical way to continue doing so.
Anthropic maintains a unified pricing policy within the same model family: the Opus family is consistently priced at $5/$25 from version 4.5 through 4.8. Upgrading from 4.7 to 4.8 incurs no additional cost, and there is no pricing difference between Opus generations.
Why Regional Endpoints Cost 10% More
Vertex AI Claude endpoints come in two types: global endpoints and regional endpoints (regional or multi-region). According to Anthropic's official documentation, regional endpoints carry a 10% premium over global pricing.
For example, if the global input rate for Sonnet 4.6 is $3.00, using a regional endpoint such as us-east5 or europe-west1 brings the input rate up to $3.30.
Common reasons to choose a regional endpoint include:
- Regulatory requirements that mandate data stay within a specific region (e.g., GDPR, healthcare privacy laws)
- A need to minimize latency by staying close to a particular region
- Internal policies that require traffic to be routed to a specific region
If cost optimization is your goal, use the global endpoint unless regulatory requirements dictate otherwise.
As of June 2026, Claude-compatible regions include us-east5, europe-west1, and asia-southeast1, but Claude is not yet available in the Tokyo region (asia-northeast1). If low latency within Japan is a priority, AWS Bedrock's ap-northeast-1 (Tokyo) or the Anthropic direct API offers a geographic advantage.
Up to 50% Savings with Batch API
For workloads that don't require immediate responses, Batch API lets you process requests at half the standard price. Typical use cases include bulk document conversion, automated scoring of evaluation sets, and background data processing.
| Model | Batch Input | Batch Output |
|---|---|---|
| Claude Opus 4.8 / 4.7 | $2.50 / MTok | $12.50 / MTok |
| Claude Sonnet 4.6 / 4.5 | $1.50 / MTok | $7.50 / MTok |
| Claude Haiku 4.5 | $0.50 / MTok | $2.50 / MTok |
Batch API processes requests asynchronously, with completion times ranging from minutes to hours. In return, you run the same models at half the cost of real-time calls. Workloads with relaxed SLAs — such as monthly aggregation jobs or nightly report generation — are strong candidates for offloading to Batch API for significant cost benefits.
Prompt Caching: How It Works and What It Costs
For workloads that repeatedly use the same system prompt or lengthy reference materials, Prompt Caching can dramatically reduce input costs. Writing to the cache incurs a higher rate, but repeated cache hits bring total costs down.
| Operation | Price Multiplier | Cache Duration |
|---|---|---|
| Cache write (5-minute retention) | Base input × 1.25 | 5 minutes |
| Cache write (1-hour retention) | Base input × 2.0 | 1 hour |
| Cache read (hit) | Base input × 0.1 | Same as write |
As a concrete example, for Sonnet 4.6 (input $3.00 / MTok), a 5-minute cache write costs $3.75 / MTok, while a cache hit read costs $0.30 / MTok. You need roughly 4–5 hits on the same cache to recoup the write cost.
Prompt Caching is especially effective for chat applications that send a long system prompt every turn, or RAG pipelines that embed the same document dozens of times.
Pricing Comparison with AWS Bedrock
Here is a summary of the key differences between Vertex AI and AWS Bedrock, the other major cloud platform offering Claude.
| Dimension | Vertex AI | AWS Bedrock |
|---|---|---|
| Base rates | Same as direct API | Same as direct API |
| Regional premium | +10% vs. global | Varies by region |
| Batch discount | 50% | 50% |
| Prompt Caching | Available | Available |
| Billing integration | GCP billing account | AWS billing account |
| Haiku 3.5 availability | Yes | Yes |
| Tokyo region | Not available (as of June 2026) | Available via ap-northeast-1 |
| IAM integration | Google Cloud IAM | AWS IAM |
Bedrock has an edge when it comes to Tokyo region support. On the other hand, projects where most infrastructure already runs on GCP can benefit from consolidating IAM, audit logs, and cost management by staying on Vertex AI. Factor in the cost of switching vendors before making your decision.
Managing Billing and Setting Budget Alerts on GCP
Vertex AI Claude usage costs appear alongside other GCP services in the Cloud Billing dashboard. Setting up budget alerts in advance is important to prevent cost overruns.
- Open Google Cloud Console → [Billing] → [Budgets & alerts]
- Click [Create budget] and specify "Vertex AI" as the service in the scope
- Enter a monthly spending limit and configure email notification recipients for 50%, 90%, and 100% thresholds
- Click [Save] to activate the alerts
Applying labels by project or team lets you see exactly which services or departments are driving Claude costs. Enabling billing data export to BigQuery also lets you analyze spending trends and per-model cost breakdowns freely with SQL.
If you're making a high volume of Claude calls, it's worth checking with Google Cloud sales to see whether Committed Use Discounts (CUDs) or Committed Use Contracts might apply.
4 Practical Rules for Cost Optimization
-
Assign models deliberately by tier — Route relatively simple tasks like summarization, classification, and translation to Haiku 4.5 ($1/$5), and reserve Opus ($5/$25) for complex reasoning and long-form generation only. Using Sonnet ($3/$15) as the middle-ground default and selecting models per use case can substantially cut total costs.
-
Aggressively offload to Batch API — Identify workloads that don't require real-time responses and run them as nightly or weekly batch jobs via Batch API. The 50% discount is the single most effective cost-reduction lever for any given model.
-
Use Prompt Caching where it fits — Enable Prompt Caching in applications that repeatedly send the same long system prompts or reference documents. Using 1-hour retention and designing for maximum cache hit rates across conversation turns is an effective approach.
-
Use regional endpoints only when data residency is required — Stick with the global endpoint when there are no regulatory requirements, and avoid the 10% premium. The global endpoint offers equivalent SLA and reliability — only the cost differs.
For the latest pricing information, refer to the Anthropic Official Pricing Page and the Google Cloud Vertex AI Pricing Page. Prices are subject to change.