What is Claude Opus 4.1 | SWE-bench, Pricing, and Changes Explained

Clauder Navi 編集部 / 最終更新 2026-05-13

For developers wondering what sets Claude Opus 4.1 apart from Opus 4, this article breaks down the improvements, pricing, and how to choose the right API identifier. We cover what a 74.5% score on SWE-bench Verified means in practice, how multi-file refactoring and long-document search behavior has changed, and when it makes sense to pin Opus 4.1 in a production environment alongside the latest-generation Opus models in 2026.

Article Summary by AI Chatpowered by Claude

Article Summary via AI Imagepowered by ChatGPT

結論powered by Claude

Claude Opus 4.1 is a minor update to Opus released on August 5, 2025. It maintains the same pricing as Opus 4 ($15 input / $75 output per 1M tokens) while raising the bar for code editing and research quality. By specifying the API identifier claude-opus-4-1-20250805 directly, you can pin behavior in a reproducible way for production use.

The core improvements focus on two areas: multi-file refactoring and source tracing across long documents. SWE-bench Verified reaches 74.5% (approximately +2 points over Opus 4), and it also outperforms its predecessor on Terminal-Bench. General reasoning and instruction-following have also improved, with fewer premise breakdowns during long conversations — a difference that becomes noticeable in real use.

As of this article's publication (May 2026), Opus 4.7 (with 1M token context) was the latest version, and as of June 2026, the successor Claude Opus 4.8 is now the latest (4.7 being its predecessor). Even so, pinning the Opus 4.1 snapshot remains a valid choice when you need stable production behavior or cost optimization. The practical answer in 2026 is: go with the latest for cutting-edge performance, and stick with 4.1 for stability.

目次 (11)

What is Claude Opus 4.1 — A Practical Upgrade, Not a Placeholder
What Changed from Opus 4 — 3 Key Improvements
1. Improved Multi-File Refactoring Accuracy
2. Agentic Search and Detail Tracking
3. General Reasoning and Instruction-Following Improvements
Benchmark Performance — What 74.5% on SWE-bench Means
Positioning vs. Opus 4.7 — Why Choose Opus 4.1 in 2026
Pricing — Maintained at Opus 4 Levels
How to Use It — 4 Major Platforms
Real-World Applications — Where It Makes the Most Impact
Summary — A Minor Version Number That Punches Above Its Weight

What is Claude Opus 4.1 — A Practical Upgrade, Not a Placeholder

Claude Opus 4.1 was released on August 5, 2025, as the second model in Anthropic's flagship "Opus" 4.x series. While the version number moves only one decimal place, the implementation builds on Opus 4's architecture with additional post-training refinements — positioning it as a version that raises practical quality by a meaningful notch.

Anthropic highlights improvements in three areas: first, autonomous task execution (so-called agentic workflows); second, editing accuracy within real codebases; and third, the ability to trace evidence across long documents and derive conclusions — what might be called "detail tracking" and retrieval quality. For a minor version, the adjustments are precisely targeted at pain points that matter in real-world use.

The API model identifier is claude-opus-4-1-20250805. As the date suffix indicates, it is fixed as a snapshot according to Anthropic's model naming convention. In production, specifying this identifier directly allows you to pin behavior in a fully reproducible way.

What Changed from Opus 4 — 3 Key Improvements

1. Improved Multi-File Refactoring Accuracy

The most significant change is in code modification tasks spanning multiple files. For example, replacing an interface with a different signature and updating all usages, or rewriting type definitions across 30 files to match a new schema — in these scenarios, Opus 4.1 noticeably reduces missed dependencies and overlooked breaking changes. Anthropic describes this as an area of "particularly significant performance gains."

2. Agentic Search and Detail Tracking

Behavior is more stable for Retrieval-Augmented Generation tasks that require sourcing answers from long documents, as well as multi-step agentic research workflows. The ability to "return to the source" — noting which page of a document something appeared on, or what the source URL was — has improved, making it especially useful for research automation and legal or financial review work.

3. General Reasoning and Instruction-Following Improvements

Beyond benchmark numbers, the practical difference is that complex instructions are less likely to be missed, and premises are less likely to break down during long conversations. Since pricing and speed remain on par with Opus 4 while quality improves, API users can benefit simply by switching the identifier.

Benchmark Performance — What 74.5% on SWE-bench Means

The representative scores at the time of release are as follows.

Benchmark	Claude Opus 4.1	Claude Opus 4 (reference)
SWE-bench Verified	74.5%	~72%
Terminal-Bench	Exceeds prior model	Official score not published

SWE-bench Verified is a coding evaluation that presents issues extracted from real OSS repositories on GitHub — close to an industry standard. The Verified subset is narrowed to problems where tests pass strictly, and a score of 74.5% means the model generated a "patch that passes tests" for roughly three out of four issues presented. Opus 4's score on the same metric was in the low 72% range, representing an improvement of approximately 2 points.

This may not look like a dramatic jump, but SWE-bench problems become exponentially harder at the upper end of the difficulty scale. A 1–2 point gain in that range means "some of the hard problems that were previously missed are now solvable." In practice, this translates to something like: a class of issues that used to require reviewer feedback no longer appearing in the first draft.

Terminal-Bench measures performance on CLI operation tasks — specifically the ability to chain commands in a shell to accomplish a goal — and has attracted growing attention since agentic workflows became more common in 2025 and beyond. Anthropic officially describes Opus 4.1's Terminal-Bench result only qualitatively as "exceeding prior models" without publishing a specific number. Accordingly, this article does not fill in any unverified figures, and the table above reflects that with "Official score not published."

Positioning vs. Opus 4.7 — Why Choose Opus 4.1 in 2026

At the time of this article's publication (May 2026), the latest Opus model was Claude Opus 4.7 (with 1M token context support). Note that as of June 2026, the successor Claude Opus 4.8 is now the latest, with 4.7 being its predecessor. In any case, the latest generation surpasses 4.1 on both benchmarks and context length — yet there are realistic reasons to still choose Opus 4.1:

Pinning existing pipelines: When you want to keep a production system fixed at claude-opus-4-1-20250805 without incurring validation costs for a migration.
Reproducibility for evaluation experiments: When a paper or internal benchmark requires "comparison using the same model."
Pricing continuity: Since Opus 4.1 is offered at the same price as Opus 4, projects that want quality improvements without migration cost overhead can take advantage of this.

For new projects seeking maximum performance, choosing the latest version is straightforward. But for scenarios where you want a "light upgrade within an existing system," Opus 4.1 remains a rational choice.

Pricing — Maintained at Opus 4 Levels

Anthropic did not update its pricing table with the release of Opus 4.1, keeping the same unit price as Opus 4. This is an important factor for API users: switching the identifier alone yields a quality improvement with no increase in cost.

Users on Claude's paid plans (Pro / Max) can access Opus 4.1 at no additional charge, and it is also available through Claude Code, the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI. Organizations operating across multiple clouds can coordinate a simultaneous switch once the Opus 4.1 identifier becomes available in each environment.

How to Use It — 4 Major Platforms

The main ways to call Opus 4.1 are as follows.

Anthropic API: Use claude-opus-4-1-20250805 as the model identifier. Official SDK implementations are available for Python and TypeScript.
Claude Code: An interactive development tool in the terminal. Opus 4.1 can be used with full functionality as an editor assistant.
Amazon Bedrock: Fully managed usage on AWS. Suitable for organizations requiring VPC-contained operation or CloudTrail audit logging.
Google Cloud Vertex AI: Invocation from GCP environments. Ideal when integrating with BigQuery or Vertex Pipelines.

Regardless of which path you use, for agentic tasks that run for extended periods, it's worth designing operational parameters such as per-request timeouts, retry logic, and tool call parallelism in advance.

Real-World Applications — Where It Makes the Most Impact

Working backward from the areas where Opus 4.1 has improved, the tasks with the greatest return on investment are concentrated in the following:

Large-scale refactoring across an existing codebase, or framework migrations
Clause extraction and diff comparison across long contracts or specification documents
Autonomous research involving sequential calls to multiple external APIs (research automation)
Bug fixes involving simultaneous edits to multiple files, beyond what IDE assistance can handle

Conversely, for simple summarization or single-turn templated output, Sonnet/Haiku models are often more rational in terms of both cost and speed. Building a routing step into your pipeline that first determines "does this actually need Opus?" can meaningfully simplify month-to-month cost management.

Summary — A Minor Version Number That Punches Above Its Weight

Claude Opus 4.1 may look like a modest "4 to 4.1" update on the surface, but it delivers meaningful improvements precisely in the kinds of tasks you'd turn to Opus for: multi-file editing, long-document tracing, and agentic workflows. Since quality improves at the same price point, for organizations already running Opus 4 in production, this is essentially a no-risk upgrade.

For new projects that need the latest and greatest, the current-generation Opus (Opus 4.8 as of June 2026) is the right call — but for systems in stable operation, Opus 4.1 remains "more than strong enough for real-world use."

Sources:

Anthropic — Claude Opus 4.1 (Primary source for SWE-bench Verified 74.5% and other benchmarks, model identifier, and pricing)
Anthropic — Model Overview (identifiers and specs)

参考になったら ♡

この記事は役立ちましたか?

ご注意: Clauder Navi は Anthropic 公式情報を直接参照し正確な内容に努めておりますが、本記事の内容に基づく投資判断・契約・利用結果による損害について責任を負いかねます。重要な意思決定の際は、必ず Anthropic 公式・ claude.com の一次情報をご自身でご確認ください。

Clauder Navi 編集部

@clauder_navi

Anthropic の Claude / Claude Code を中心に、日本のエンジニア向けに最新動向と実務を毎日発信。運営方針はメディアについてをご覧ください。

プロフィール → 副社長コラム → レッスン一覧 →