Claude Fast Mode | How to Use 2.5x Speed, Pricing & Subscription Details

AI-generated article summarypowered by Claude

Claude Fast Mode | How to Use 2.5x Speed, Pricing & Subscription Details

"Claude's output is too slow." "I wish it would respond faster." — The official answer to these frustrations is Fast Mode. It runs the same model in a speed-optimized configuration, boosting output by up to 2.5x. The feature launched as a research preview in February 2026. This article summarizes supported models, how to use it, pricing, and its relationship to subscriptions — all based solely on primary Anthropic sources.

結論powered by Claude
![Claude Fast Mode | How to Use 2.5x Speed, Pricing & Subscription Details](/assets/generated/placeholder-claude-speed-hero.jpg)
目次 (10)

Quick Summary — "Making Claude Faster" = Fast Mode for 2.5x Output

Here are the key points at a glance.

Item Details
What speeds up Output tokens per second (OTPS), up to 2.5x
Supported models Claude Opus 4.8 / 4.7 / 4.6 only
Quality Same model weights and behavior. No degradation in intelligence or capability
Enable (Claude Code) /fast command
Enable (API) speed: "fast" parameter
Pricing (Opus 4.8) $10 input / $50 output per 1M tokens
Subscriptions Billed directly to usage credits, even on Pro/Max plans
Status Research preview (specs and pricing subject to change)

Two key points to remember: what speeds up is the output volume per second, not the time until the first response, and Fast Mode is not included in subscription plans — it is billed separately.

What Is Fast Mode — Running the Same Model at 2.5x Speed

Fast Mode is not a switch to a different, lighter model. It runs Claude Opus in an inference configuration that prioritizes speed over cost efficiency. The model weights and behavior are identical, so there is no change in intelligence or functionality — only the response speed increases. This is explicitly stated in the official documentation (API Docs).

The metric that improves is OTPS (output tokens per second), up to 2.5x the standard speed. However, TTFT (time to first token) is not affected by Fast Mode. The perceived time to finish a long response or code generation shrinks, but the initial response latency stays the same — keeping this in mind will help set the right expectations.

Supported Models: Opus 4.8, 4.7, and 4.6 Only

Fast Mode is available only for these three Opus models.

  • Claude Opus 4.8 (claude-opus-4-8)
  • Claude Opus 4.7 (claude-opus-4-7)
  • Claude Opus 4.6 (claude-opus-4-6)

It is not available for Sonnet, Haiku, or other models — sending speed: "fast" to an unsupported model via the API will return an error. Note that Fast Mode for Opus 4.6 is deprecated as of the Opus 4.8 release and is scheduled for removal approximately 30 days later. After removal, sending speed: "fast" with claude-opus-4-6 will not produce an error but will fall back to standard speed and standard pricing. To maintain fast performance, migrate to Opus 4.8 or 4.7.

Using Fast Mode in Claude Code — Toggle with /fast

In Claude Code, switching is a single command. The steps are as follows (Claude Code Docs).

  1. Type /fast in the terminal and press Tab to toggle on/off (or set "fastMode": true in the config file).
  2. If you were using a different model, it automatically switches to Opus, and a "Fast mode ON" confirmation appears.
  3. While active, a small icon is displayed next to the prompt.
  4. Run /fast again to disable it (the model stays on Opus; use /model to switch back to your previous model).

Claude Code v2.1.36 or later is required; the VS Code extension is not supported. The default model is Opus 4.8 (versions 2.1.142–2.1.153 default to Opus 4.7; Opus 4.8 becomes the default from v2.1.154 onward). To maximize cost efficiency, the best practice is to enable Fast Mode at the start of a session rather than mid-conversation.

Using Fast Mode via the API — The speed: "fast" Parameter

To use it via the API, include the beta header anthropic-beta: fast-mode-2026-02-01 and add speed: "fast" to your request.

curl https://api.anthropic.com/v1/messages \
  --header "x-api-key: $ANTHROPIC_API_KEY" \
  --header "anthropic-version: 2023-06-01" \
  --header "anthropic-beta: fast-mode-2026-02-01" \
  --header "content-type: application/json" \
  --data '{
    "model": "claude-opus-4-8",
    "max_tokens": 4096,
    "speed": "fast",
    "messages": [{"role": "user", "content": "Refactor this module"}]
  }'

You can verify which speed was actually used by checking the usage.speed field in the response ("fast" or "standard"). Note that Fast Mode is not available with the Batch API, Priority Tier, or Claude Platform on AWS, and is also not offered on third-party platforms such as Vertex AI, Amazon Bedrock, or Microsoft Foundry (it is limited to the Claude API research preview only).

Pricing — Opus 4.8 at $10 Input / $50 Output per 1M Tokens

Fast Mode uses a pricing multiplier applied to standard rates, and the rate is flat across the entire 1M context.

Model Input (per 1M tokens) Output (per 1M tokens)
Opus 4.8 $10 $50
Opus 4.7 / 4.6 $30 $150

The Fast Mode price for Opus 4.8 is roughly one-third of the previous generation (4.7/4.6), making speed more accessible. Prompt caching and data residency multipliers are applied on top of these Fast Mode prices.

Usage Credits Required Even on Subscriptions

This is the most important caveat. Fast Mode in Claude Code is available to Pro, Max, Team, and Enterprise plan users as well as Console users, but it is not covered by subscription rate limits — it is billed directly to usage credits. Even if you have remaining subscription quota, Fast Mode tokens are billed at Fast Mode rates from the very first token, as explicitly stated in the documentation.

To use it, you need the following:

  1. Enable usage credits on your account (individual users can do this from billing settings in the Console).
  2. For Team / Enterprise accounts, Fast Mode is disabled by default — an administrator must explicitly enable it.

Organizations looking to control costs can add fastModePerSessionOptIn: true to their management settings so that each session starts with Fast Mode off, requiring users to explicitly enable it with /fast when needed. To disable it entirely, set the environment variable CLAUDE_CODE_DISABLE_FAST_MODE=1.

Fast Mode vs. Effort Level — Understanding the Difference

There is another way to speed up Claude: adjusting the effort level. The two approaches work differently.

Setting Effect
Fast Mode Lower latency at the same quality, but higher cost
Lower effort level Speeds up by reducing thinking time. May reduce quality on complex tasks

If you want speed without sacrificing quality, use Fast Mode. If you're dealing with simple tasks where reduced thinking is acceptable, use a lower effort level. You can also combine both to target "simple tasks at maximum speed."

Rate Limits and Fallback Behavior

Fast Mode has its own dedicated rate limits separate from standard Opus, and Opus 4.8, 4.7, and 4.6 share the same pool. When the limit is exceeded or usage credits run out, the following occurs:

  1. Fast Mode automatically falls back to standard speed.
  2. The icon turns gray to indicate a cooldown.
  3. Work continues at standard speed and standard pricing.
  4. Once the cooldown ends, Fast Mode is automatically re-enabled.

Via the API, exceeding the limit returns a 429 response with a retry-after header, and the SDK will automatically retry up to 2 times by default. To switch to standard speed immediately, catch the 429 and resend the request without the speed parameter. Be aware, however, that switching speeds causes a prompt cache miss — different speeds do not share cached prefixes.

Summary — Know When Speed Actually Matters

Fast Mode is a clear, official way to speed up Claude, but it is not a universal solution. It works best for rapid iteration on code changes, live debugging, and time-sensitive work where latency matters more than cost — interactive use cases where speed is worth the price. Conversely, for long-running automated tasks, batch/CI jobs, and cost-sensitive workloads where speed is less critical, standard mode is the better fit. Keep in mind that it is billed separately from your subscription, and enable it with /fast at the start of a session — that is the most efficient way to get the speed boost when you need it.

Sources: Fast Mode (Research Preview) - Claude API Docs / Speed up responses with fast mode - Claude Code Docs

参考になったら ♡
Clauder Navi 編集部
@clauder_navi

Anthropic の Claude / Claude Code を中心に、日本のエンジニア向けに最新動向と実務 を毎日発信。 運営方針 は メディアについて をご覧ください。