What is Claude Opus 4.1 | SWE-bench, Pricing, and Changes Explained
For developers wondering what sets Claude Opus 4.1 apart from Opus 4, this article breaks down the improvements, pricing, and how to choose the right API identifier. We cover what a 74.5% score on SWE-bench Verified means in practice, how multi-file refactoring and long-document search behavior has changed, and when it makes sense to pin Opus 4.1 in a production environment alongside the latest-generation Opus models in 2026.
Claude Opus 4.1 is a minor update to Opus released on August 5, 2025. It maintains the same pricing as Opus 4 ($15 input / $75 output per 1M tokens) while raising the bar for code editing and research quality. By specifying the API identifier claude-opus-4-1-20250805 directly, you can pin behavior in a reproducible way for production use.
The core improvements focus on two areas: multi-file refactoring and source tracing across long documents. SWE-bench Verified reaches 74.5% (approximately +2 points over Opus 4), and it also outperforms its predecessor on Terminal-Bench. General reasoning and instruction-following have also improved, with fewer premise breakdowns during long conversations — a difference that becomes noticeable in real use.
As of this article's publication (May 2026), Opus 4.7 (with 1M token context) was the latest version, and as of June 2026, the successor Claude Opus 4.8 is now the latest (4.7 being its predecessor). Even so, pinning the Opus 4.1 snapshot remains a valid choice when you need stable production behavior or cost optimization. The practical answer in 2026 is: go with the latest for cutting-edge performance, and stick with 4.1 for stability.
目次 (11)
- What is Claude Opus 4.1 — A Practical Upgrade, Not a Placeholder
- What Changed from Opus 4 — 3 Key Improvements
- 1. Improved Multi-File Refactoring Accuracy
- 2. Agentic Search and Detail Tracking
- 3. General Reasoning and Instruction-Following Improvements
- Benchmark Performance — What 74.5% on SWE-bench Means
- Positioning vs. Opus 4.7 — Why Choose Opus 4.1 in 2026
- Pricing — Maintained at Opus 4 Levels
- How to Use It — 4 Major Platforms
- Real-World Applications — Where It Makes the Most Impact
- Summary — A Minor Version Number That Punches Above Its Weight
What is Claude Opus 4.1 — A Practical Upgrade, Not a Placeholder
Claude Opus 4.1 was released on August 5, 2025, as the second model in Anthropic's flagship "Opus" 4.x series. While the version number moves only one decimal place, the implementation builds on Opus 4's architecture with additional post-training refinements — positioning it as a version that raises practical quality by a meaningful notch.
Anthropic highlights improvements in three areas: first, autonomous task execution (so-called agentic workflows); second, editing accuracy within real codebases; and third, the ability to trace evidence across long documents and derive conclusions — what might be called "detail tracking" and retrieval quality. For a minor version, the adjustments are precisely targeted at pain points that matter in real-world use.
The API model identifier is claude-opus-4-1-20250805. As the date suffix indicates, it is fixed as a snapshot according to Anthropic's model naming convention. In production, specifying this identifier directly allows you to pin behavior in a fully reproducible way.
What Changed from Opus 4 — 3 Key Improvements
1. Improved Multi-File Refactoring Accuracy
The most significant change is in code modification tasks spanning multiple files. For example, replacing an interface with a different signature and updating all usages, or rewriting type definitions across 30 files to match a new schema — in these scenarios, Opus 4.1 noticeably reduces missed dependencies and overlooked breaking changes. Anthropic describes this as an area of "particularly significant performance gains."
2. Agentic Search and Detail Tracking
Behavior is more stable for Retrieval-Augmented Generation tasks that require sourcing answers from long documents, as well as multi-step agentic research workflows. The ability to "return to the source" — noting which page of a document something appeared on, or what the source URL was — has improved, making it especially useful for research automation and legal or financial review work.
3. General Reasoning and Instruction-Following Improvements
Beyond benchmark numbers, the practical difference is that complex instructions are less likely to be missed, and premises are less likely to break down during long conversations. Since pricing and speed remain on par with Opus 4 while quality improves, API users can benefit simply by switching the identifier.
Benchmark Performance — What 74.5% on SWE-bench Means
The representative scores at the time of release are as follows.
| Benchmark | Claude Opus 4.1 | Claude Opus 4 (reference) |
|---|---|---|
| SWE-bench Verified | 74.5% | ~72% |
| Terminal-Bench | Exceeds prior model | Official score not published |
SWE-bench Verified is a coding evaluation that presents issues extracted from real OSS repositories on GitHub — close to an industry standard. The Verified subset is narrowed to problems where tests pass strictly, and a score of 74.5% means the model generated a "patch that passes tests" for roughly three out of four issues presented. Opus 4's score on the same metric was in the low 72% range, representing an improvement of approximately 2 points.
This may not look like a dramatic jump, but SWE-bench problems become exponentially harder at the upper end of the difficulty scale. A 1–2 point gain in that range means "some of the hard problems that were previously missed are now solvable." In practice, this translates to something like: a class of issues that used to require reviewer feedback no longer appearing in the first draft.
Terminal-Bench measures performance on CLI operation tasks — specifically the ability to chain commands in a shell to accomplish a goal — and has attracted growing attention since agentic workflows became more common in 2025 and beyond. Anthropic officially describes Opus 4.1's Terminal-Bench result only qualitatively as "exceeding prior models" without publishing a specific number. Accordingly, this article does not fill in any unverified figures, and the table above reflects that with "Official score not published."
Positioning vs. Opus 4.7 — Why Choose Opus 4.1 in 2026
At the time of this article's publication (May 2026), the latest Opus model was Claude Opus 4.7 (with 1M token context support). Note that as of June 2026, the successor Claude Opus 4.8 is now the latest, with 4.7 being its predecessor. In any case, the latest generation surpasses 4.1 on both benchmarks and context length — yet there are realistic reasons to still choose Opus 4.1:
- Pinning existing pipelines: When you want to keep a production system fixed at
claude-opus-4-1-20250805without incurring validation costs for a migration. - Reproducibility for evaluation experiments: When a paper or internal benchmark requires "comparison using the same model."
- Pricing continuity: Since Opus 4.1 is offered at the same price as Opus 4, projects that want quality improvements without migration cost overhead can take advantage of this.
For new projects seeking maximum performance, choosing the latest version is straightforward. But for scenarios where you want a "light upgrade within an existing system," Opus 4.1 remains a rational choice.
Pricing — Maintained at Opus 4 Levels
Anthropic did not update its pricing table with the release of Opus 4.1, keeping the same unit price as Opus 4. This is an important factor for API users: switching the identifier alone yields a quality improvement with no increase in cost.
Users on Claude's paid plans (Pro / Max) can access Opus 4.1 at no additional charge, and it is also available through Claude Code, the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI. Organizations operating across multiple clouds can coordinate a simultaneous switch once the Opus 4.1 identifier becomes available in each environment.
How to Use It — 4 Major Platforms
The main ways to call Opus 4.1 are as follows.
- Anthropic API: Use
claude-opus-4-1-20250805as the model identifier. Official SDK implementations are available for Python and TypeScript. - Claude Code: An interactive development tool in the terminal. Opus 4.1 can be used with full functionality as an editor assistant.
- Amazon Bedrock: Fully managed usage on AWS. Suitable for organizations requiring VPC-contained operation or CloudTrail audit logging.
- Google Cloud Vertex AI: Invocation from GCP environments. Ideal when integrating with BigQuery or Vertex Pipelines.
Regardless of which path you use, for agentic tasks that run for extended periods, it's worth designing operational parameters such as per-request timeouts, retry logic, and tool call parallelism in advance.
Real-World Applications — Where It Makes the Most Impact
Working backward from the areas where Opus 4.1 has improved, the tasks with the greatest return on investment are concentrated in the following:
- Large-scale refactoring across an existing codebase, or framework migrations
- Clause extraction and diff comparison across long contracts or specification documents
- Autonomous research involving sequential calls to multiple external APIs (research automation)
- Bug fixes involving simultaneous edits to multiple files, beyond what IDE assistance can handle
Conversely, for simple summarization or single-turn templated output, Sonnet/Haiku models are often more rational in terms of both cost and speed. Building a routing step into your pipeline that first determines "does this actually need Opus?" can meaningfully simplify month-to-month cost management.
Summary — A Minor Version Number That Punches Above Its Weight
Claude Opus 4.1 may look like a modest "4 to 4.1" update on the surface, but it delivers meaningful improvements precisely in the kinds of tasks you'd turn to Opus for: multi-file editing, long-document tracing, and agentic workflows. Since quality improves at the same price point, for organizations already running Opus 4 in production, this is essentially a no-risk upgrade.
For new projects that need the latest and greatest, the current-generation Opus (Opus 4.8 as of June 2026) is the right call — but for systems in stable operation, Opus 4.1 remains "more than strong enough for real-world use."
Sources:
- Anthropic — Claude Opus 4.1 (Primary source for SWE-bench Verified 74.5% and other benchmarks, model identifier, and pricing)
- Anthropic — Model Overview (identifiers and specs)