#swe-bench の記事
3 記事
-
Claude
What Is Claude Opus 4.5 | 76% Token Reduction Compared to Sonnet 4.5 Explained
A breakdown of Claude Opus 4.5's capabilities and pricing. At $5 input / $25 output — Sonnet-level pricing — it delivers roughly 67% cost reduction over Opus 4.1, and at medium Effort achieves Sonnet 4.5-equivalent accuracy with 76% fewer tokens. Covers how to use the Effort parameter and how to leverage prompt caching and batch inference.
-
Claude
Claude Opus 4.1 / 4.5 / 4.6 / 4.7 | Performance & Pricing Comparison
A four-axis comparison of Claude Opus 4.1 / 4.5 / 4.7 across SWE-bench score, pricing, speed, and migration risk. For autonomous agents, Opus 4.7 with SWE-bench 87.6% is the top choice; for cost savings, Opus 4.5 at $5 input is optimal; to avoid breaking changes, sticking with Opus 4.6 is the practical solution.
-
Claude
What is Claude Opus 4.1 | SWE-bench, Pricing, and Changes Explained
A breakdown of Claude Opus 4.1's changes, pricing, and API identifier usage. At the same price as Opus 4 ($15 input / $75 output per 1M tokens), it enhances multi-file refactoring and long-document search quality, achieving 74.5% on SWE-bench Verified. Includes guidance on when to pin a snapshot for stable production behavior.