Implement with Claude, Review with Codex | A Division-of-Labor Approach to Higher Code Quality
More and more developers are realizing: "Having the same Claude that implemented the code turn around and review it doesn't make much sense." It's structurally difficult for the same model to objectively evaluate its own output — it tends to treat the design patterns it chose during implementation as inherently correct. That's why the "implement with Claude Code, review with Codex" two-AI division-of-labor workflow is gaining traction.
This article draws on official documentation and Japanese tech blogs to explain the background of this workflow, how to set it up, the concrete steps involved, and examples of automation.
目次 (12)
- Why "Implement with Claude, Review with Codex"?
- Overview of the Division-of-Labor Workflow
- Setup — Preparing Claude Code and Codex CLI
- Installing Claude Code
- Installing Codex CLI
- Launching in Read-Only Mode for Review-Only Use
- The Basic Cross-Review Workflow
- Automating the Back-and-Forth with a codex-review Skill
- Seamless Integration with tmux-sender
- Real-World Cases — Critical Oversights Caught by Codex
- Cost and Practical Operation Tips
- Summary
Why "Implement with Claude, Review with Codex"?
The biggest challenge when AI reviews code is the "self-evaluation blind spot." When the same model that generated the code tries to self-check it, it tends to assume that the design decisions it made are correct. Fundamental architectural issues or bugs that could have been avoided with a different approach are structurally hard to catch through self-review.
The solution is to combine AI systems from different companies with different architectures. Anthropic's Claude Code and OpenAI's Codex differ in training data and reasoning tendencies, making it easier to create a complementary relationship where one catches what the other misses.
A technical blog from Classmethod reports practical cases such as: "Codex caught an authentication token log-exposure bug that Claude Code had missed," and "Codex detected a regression in advance that passed tests but broke in production." (Source: dev.classmethod.jp)
Overview of the Division-of-Labor Workflow
This workflow consists of three phases.
- Claude Code implements — Generates and edits code based on the spec, handling file operations, running tests, and committing
- Codex cross-reviews — A separate AI system validates Claude's output and provides feedback from an external perspective that questions the assumptions behind the design
- A human makes the final call and merges — Integrates the review results from both AIs and makes the final decision. The AIs provide material for judgment; the human makes the decision
The key is to treat Codex as a "colleague," not an "authority." Rather than blindly accepting Codex's review, prioritize issues flagged by both Claude and Codex, and let a human read the context and decide on issues flagged by only one of them. (Source: dev.classmethod.jp)
Setup — Preparing Claude Code and Codex CLI
Installing Claude Code
On macOS / Linux / WSL, installation is a single command.
curl -fsSL https://claude.ai/install.sh | bash
Windows also supports WinGet.
winget install Anthropic.ClaudeCode
A Claude Pro subscription or higher is required. For the latest plans, see Claude's official pricing page. (Source: code.claude.com)
Installing Codex CLI
npm install -g @openai/codex
codex login
If you have a ChatGPT Plus, Pro, Business, Edu, or Enterprise subscription, you can use it at no additional charge. After installation, sign in with your ChatGPT account or authenticate with an API key. (Source: developers.openai.com)
Launching in Read-Only Mode for Review-Only Use
When using Codex exclusively for review, it is recommended to launch it with the --sandbox read-only option to deny file write permissions.
codex exec --sandbox read-only "このコードの問題点を指摘して"
This option allows you to receive reviews while completely eliminating the risk of Codex accidentally modifying your code.
The Basic Cross-Review Workflow
For manual operation, the basic steps are as follows.
- Complete implementation and testing in Claude Code and finalize the changed files
- Retrieve the paths of the changed files and their diffs (e.g.,
git diff) - Launch Codex in
--sandbox read-onlymode in a separate terminal and pass it the changes - Receive Codex's review results (issues found, severity)
- Prioritize fixing issues flagged by both Claude Code and Codex
- For issues flagged by only one of them, have a human read the context and decide
- After fixes, ask Codex for another review and confirm
ok: truebefore merging
The crux of this workflow is the priority judgment of "shared findings > unique findings." Issues that both Claude and Codex flag are highly reliable; issues raised by only one may reflect model-specific biases. (Source: dev.classmethod.jp)
Automating the Back-and-Forth with a codex-review Skill
To eliminate the hassle of switching between two terminals manually, automation using Claude Code's Skills feature has become popular.
The "codex-review skill" introduced in a note.com article works by placing a configuration file at .claude/skills/codex-review/SKILL.md, which causes Claude Code to automatically invoke Codex after completing its work and return the results. (Source: note.com)
The skill's internal logic consists of three stages.
- Scale assessment — Determines the size of the change (small / medium / large) and adjusts the review strategy accordingly
- Codex read-only review execution — Calls
codex exec --sandbox read-onlyand receives results in JSON format - Fix → re-review iteration — Repeats up to 5 times until
ok: trueis returned
The output JSON schema includes ok (pass/fail), severity (blocking / advisory), and category classifications, making it easy to integrate into downstream automation workflows.
By explicitly writing "always run codex-review after completing each phase" in your implementation plan file (PLANS.md), you can design Claude Code to autonomously pass through this gate.
Seamless Integration with tmux-sender
For developers who manage multiple terminal panes with tmux, integration via tmux-sender is effective.
A Zenn article describes defining a command like the following as a Claude Code custom setting, which sends commands directly to a Codex pane in another tmux window. (Source: zenn.dev)
# Claude Code のカスタム設定から別 tmux ペインへコマンドを送る
tmux send-keys -t codex-pane "codex exec --sandbox read-only 'この差分をレビューして'" Enter
With this approach, you can launch Codex's review directly from within Claude Code's workflow and nearly fully automate the flow from there until results appear in the pane. Furthermore, by combining this with MCP (Model Context Protocol) to load Jira ticket content into Claude Code before starting implementation, you can run the entire pipeline — spec → implementation → cross-review — in sequence.
Real-World Cases — Critical Oversights Caught by Codex
Here are two specific cases from Classmethod's report.
Case 1: Authentication Token Exposed in Logs
In an API integration implemented by Claude Code, authentication tokens were being written out to debug logs. Claude's self-review evaluated the code as "no functional issues," but Codex flagged "writing tokens to logs is a security risk," allowing the issue to be fixed before the production release.
Case 2: A Regression That Passes Tests but Breaks in Production
All unit tests were green, but Codex scrutinized the mock preconditions and flagged that "the production environment and mock assumptions differ, so this will break in production." The risk of shipping without real-environment validation was caught in advance.
In both cases, the oversight occurred because Claude was checking the code "within the assumptions of the design it had created." Codex functioned as an external perspective that questioned whether those assumptions were correct in the first place — this is the common thread. (Source: dev.classmethod.jp)
Cost and Practical Operation Tips
Here are key points to keep in mind when actually operating this workflow.
Cost: You need subscriptions to two services — Claude Pro or higher (around $20/month or more) and ChatGPT Plus or higher (around $20/month or more). Check Claude's official pricing and ChatGPT's official pricing for the latest prices.
Scope: Applying this to all code will increase costs, so it's practical to start by limiting it to "important PRs" and "merges involving design changes." The Classmethod article also notes that "focusing on design documents and PRs yields the best cost-effectiveness."
Current limitations: The tmux-sender skill and codex-review skill are community implementations, not officially supported features. They may break with updates to Claude Code or Codex, so periodic maintenance is required. (Source: zenn.dev)
Humans remain ultimately responsible: Even with two AI systems, the final merge decision must always be made by a human. Rather than shipping code just because "Codex said ok," treat AI feedback as input material and have humans read the context and make the call.
Summary
Here are the key points of the "implement with Claude Code, review with Codex" workflow.
- Self-review by the same AI is structurally prone to a "self-evaluation blind spot"
- Combining different AI lineages — Claude (Anthropic) and Codex (OpenAI) — enables complementary code review
- Start with two terminals running in parallel, then automate with a codex-review skill or tmux-sender once you're comfortable
- Prioritize issues flagged by both AIs; let a human judge issues flagged by only one
- Limiting the approach to important PRs makes it easier to balance cost and effectiveness
Both Claude Code and Codex continue to receive new features, and the integration workflow is still evolving. The lowest-cost first step is to try applying this to a single important PR in your local repository.