What Are Claude Tokens? Limits, Japanese Character Conversion, and Ways to Reduce Usage

Claude トークンコンテキストウィンドウ使用制限料金最適化

Clauder Navi 編集部 / 最終更新 2026-06-17

Article Summary by AI Chatpowered by Claude

When using Claude, you may encounter notifications such as "You have reached the token limit" or "Your usage is approaching the limit." This article, based on official Anthropic information, explains what tokens are, why Japanese users in particular need to pay attention to them, and how to reduce consumption.

Article Summary by AI Chatpowered by Claude

結論powered by Claude

Tokens are the smallest unit Claude uses to process text — roughly 4 English characters ≈ 1 token, and 1 Japanese character ≈ 1–2 tokens (averaging about 1.5). Because Japanese consumes more tokens than English, Japanese users are more significantly affected in terms of conversation duration and cost. Plan-specific limits are not publicly disclosed, but usage caps expand in order from Free → Pro → Max, with Max guaranteeing 5x or 20x the usage of Pro. To reduce token consumption, resetting conversations, keeping prompts concise, and using different models for different tasks are all effective strategies.

目次 (19)

What Are Claude Tokens?
Why Does Japanese Consume More Tokens?
The Relationship Between Context Window and Tokens
Context Window by Model as of 2026
Token Usage Limits by Plan
How the 5-Hour Reset Works
Practical Ways to Reduce Token Consumption
1. Start a New Conversation
2. Keep Prompts Concise
3. Quote Only the Relevant Parts of Large Files
4. Use Lighter Models for Appropriate Tasks
5. Use .claudeignore with Claude Code
6. The "Summarize and Reset" Technique for Long Conversations
Counting Tokens in Advance via API (For Developers)
Frequently Asked Questions
What happens when I reach the token limit?
Does attaching images or PDFs consume tokens?
What is prompt caching?
Can you save tokens by writing prompts in English instead of Japanese?

What Are Claude Tokens?

A token is the smallest unit of processing that Claude uses when reading and writing text. Just as humans read text in units of "characters" or "words," Claude recognizes and generates text in units called "tokens."

Technically, tokens are generated using an algorithm called BPE (Byte Pair Encoding). Rather than using words or characters directly, this method splits text into "subword units" by grouping frequently occurring patterns together. For example, in English, "running" might be split into "runn" + "ing" — two tokens.

Anthropic uses a unified tokenizer across all Claude models. As of 2026, the new tokenizer introduced with Claude Opus 4.7 has been adopted in the latest models (reference: Anthropic official model overview).

Why Does Japanese Consume More Tokens?

The tokenization efficiency differs significantly between English and Japanese.

Language	Rough guide
English	~4 characters ≈ 1 token
Japanese	1 character ≈ 1–2 tokens (average ~1.5 tokens)

English consists of combinations of 26 alphabet characters, making it easy to group frequent patterns and achieve high efficiency. Japanese, on the other hand, mixes hiragana, katakana, kanji, and symbols, and the tokenizer often splits a single character into multiple tokens.

Concrete examples:

Hello → approximately 1–2 tokens
こんにちは → approximately 7–8 tokens (5 characters)

This difference means that writing the same content in Japanese may consume 1.5 to 2 times more tokens than in English. If the usage limit for a monthly subscription plan is managed based on token count, Japanese users will reach the limit sooner.

Note that the new tokenizer adopted from Claude Opus 4.7 onward generates approximately 30% more tokens for the same text compared to older models. API users need to account for this change when migrating to newer models (reference: official token counting documentation).

The Relationship Between Context Window and Tokens

The "context window" refers to the total number of tokens Claude can reference in a single conversation. All of the following elements within a conversation consume the context window:

System prompt (instruction text)
Past conversation history (user messages + Claude's responses)
The current input message
Claude's output

The longer the conversation continues, the more tokens accumulate, and the context window fills up. When the limit is reached, Claude either "forgets" earlier parts of the conversation or becomes unable to respond further.

Context Window by Model as of 2026

Model	Context Window	Max Output
Claude Opus 4.8	1 million tokens	128,000 tokens
Claude Sonnet 4.6	1 million tokens	64,000 tokens
Claude Haiku 4.5	200,000 tokens	64,000 tokens
Claude Fable 5	1 million tokens	128,000 tokens

One million tokens is equivalent to approximately 750,000 words (in English), or roughly 500,000–700,000 Japanese characters. While this is large enough to handle several books' worth of text at once, it is gradually consumed as long conversations accumulate (reference: Anthropic model overview).

Token Usage Limits by Plan

Claude.ai offers multiple plans, and usage limits vary by plan. While Anthropic does not officially disclose specific token counts, the relationships between plans are clearly defined.

Plan	Monthly price (annual billing)	Usage estimate
Free	Free	Basic usage (with limits)
Pro	$17–$20	Significantly more than Free
Max	$100+	5x or 20x that of Pro
Team	$20–$100 / seat	For teams (Pro-equivalent or higher)

The Max plan comes in two variants — "5x" and "20x" — and you choose based on your needs (reference: claude.com/pricing).

How the 5-Hour Reset Works

Claude has a mechanism known as the "5-hour session limit," where usage is temporarily restricted if it exceeds the cap within a certain time period. The restriction resets automatically after 5 hours have passed. Users who use Claude Code intensively have reported hitting the limit within 5 hours even on the Max plan.

Practical Ways to Reduce Token Consumption

By reducing token consumption, you can work longer and accomplish more within the same plan.

1. Start a New Conversation

The most effective method is resetting the conversation. Because Claude responds while remembering all past exchanges, longer conversations consume more tokens. When moving to a new topic, use "Start new conversation" to discard unnecessary history.

2. Keep Prompts Concise

Long preambles and repeated explanations are a direct waste of tokens. Simply developing the habit of conveying your request in a single, clear sentence can reduce token consumption by dozens to hundreds of tokens.

3. Quote Only the Relevant Parts of Large Files

Instead of pasting entire code files or documents, extract only the sections relevant to your question. It's more efficient to say "I have a question about lines 100–150 of this file" and then quote only those lines.

4. Use Lighter Models for Appropriate Tasks

For simple questions or routine tasks, using Haiku 4.5 (lightweight, fast, low-cost) is effective, while Sonnet or Opus is better suited for complex reasoning or long-form tasks. You can select the model in Claude.ai's conversation settings.

5. Use .claudeignore with Claude Code

When using Claude Code (the coding assistant tool), you can significantly reduce the number of files loaded into the context by using a .claudeignore file to exclude unnecessary directories (such as node_modules or dist).

A technical blog from the GMO Group reports that effective token management strategies for Claude Code include "distributing work to Codex," "splitting into smaller tasks," and "adjusting the effort parameter," and that consumption can be managed even on the $100/month Max plan (reference: Claude Code Token Management).

6. The "Summarize and Reset" Technique for Long Conversations

When a conversation starts getting long, first ask Claude to "summarize the discussion so far in 200 characters," then paste only that summary as the system prompt in a new conversation. This lets you retain past context while substantially resetting consumed tokens.

Counting Tokens in Advance via API (For Developers)

Developers using the Claude API can check token counts before actually sending messages.

Anthropic provides a count_tokens endpoint for this purpose. Prepare the message you intend to send, query the endpoint, and it will return an estimated value of input_tokens.

curl https://api.anthropic.com/v1/messages/count_tokens \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "content-type: application/json" \
    --header "anthropic-version: 2023-06-01" \
    --data '{
      "model": "claude-opus-4-8",
      "system": "You are a helpful assistant",
      "messages": [{"role": "user", "content": "こんにちは、Claudeです"}]
    }'

Example response:

{ "input_tokens": 18 }

This endpoint is free to use (subject to rate limits). By using it, you can estimate costs and the impact on the context window before actually sending a message (reference: Anthropic official token counting documentation).

In addition to text, you can also pre-count tokens for images, PDFs, and tool definitions. Since token consumption per image can range from hundreds to thousands of tokens depending on content and resolution, this is especially useful for applications that make heavy use of images.

Frequently Asked Questions

What happens when I reach the token limit?

In Claude.ai, when you reach the 5-hour usage limit, you are temporarily restricted and a notification such as "You have exceeded the limit" is displayed. The system automatically resets after 5 hours. The options are to upgrade your plan or wait and try again later. For details on error types and how to handle them, see the article "Claude Usage Limit Errors."

Does attaching images or PDFs consume tokens?

Yes, images, PDFs, and document files are all counted as input tokens. High-resolution images tend to consume more tokens than low-resolution ones. You can use the count_tokens endpoint to check the token count before attaching files.

What is prompt caching?

This is a feature for API users that can reduce costs by up to 90% through caching when the same system prompt is used repeatedly. Claude.ai (browser version) users do not need to think about this, but it is an important optimization technique for developers looking to reduce API costs.

Can you save tokens by writing prompts in English instead of Japanese?

In theory, English is more token-efficient. However, when you also consider the effort of translation and the cost of retrying due to misinterpretation, the benefit of Japanese native users forcing themselves to use English is limited. A practical balance is to write English only for important projects or frequently reused template instructions (system prompts).

参考になったら ♡

この記事は役立ちましたか?

ご注意: Clauder Navi は Anthropic 公式情報を直接参照し正確な内容に努めておりますが、本記事の内容に基づく投資判断・契約・利用結果による損害について責任を負いかねます。重要な意思決定の際は、必ず Anthropic 公式・ claude.com の一次情報をご自身でご確認ください。

Clauder Navi 編集部

@clauder_navi

Anthropic の Claude / Claude Code を中心に、日本のエンジニア向けに最新動向と実務を毎日発信。運営方針はメディアについてをご覧ください。

プロフィール → 副社長コラム → レッスン一覧 →