The 2026 Feb release of Codex CLI 0.98.0 and the integration of gpt-5.3-codex introduce a major speedup: parallel tool calling. But there’s a hidden cost that matters for anyone who relies on “approve/deny” prompts: a tool-call race condition that can desynchronize approvals from the commands that actually execute.
If you’re a security researcher or a developer using the CLI, this isn’t just a minor UX bug. It’s an authorization integrity failure: the UI says you approved one thing, but a different action runs.
TL;DR: treat this like a TOCTOU (time-of-check vs time-of-use) problem. Until it’s fixed, avoid approving multi-step tool sequences without carefully watching the actual terminal output and queued prompts.
The issue: approval swap & desynchronization
Users report cases where the CLI asks for approval on one command (for example, a build), but executes a different queued command (for example, a test suite). Then, a second approval prompt appears later—and the two approvals effectively “swap” the commands they gate.
Environment (as reported)
- Codex CLI: 0.98.0
- Model: gpt-5.3-codex
- Platform: Linux 6.8.0-88-generic (Terminator)
Observed behavior (example)
- The prompt: UI asks: “Approve running
make build?” - The action: user clicks Yes.
- The result: terminal shows
pytestrunning instead. - The rebound: moments later, a second prompt for
pytestappears, butmake buildexecutes upon its approval.
Technical root cause: parallel execution loops
The key change is that newer models can emit multiple tool calls in a single response. If the CLI harness decides two tool calls are “independent,” it may attempt to process them concurrently.
That concurrency is where a race can happen:
- One tool call requires a manual approval (a UI gate).
- Another tool call is auto-approved (or simply starts faster).
- When the queue and UI prompts aren’t bound to specific tool-call IDs, execution order can flip—and the approval dialog no longer corresponds to the command that runs.
Even if nothing malicious is happening, concurrency plus weak binding between “what you approved” and “what runs” can produce an approval swap.
Security risk: TOCTOU and authorization confusion
In security terms, this resembles a TOCTOU vulnerability:
- Time of check: the user approves a specific command in the UI.
- Time of use: a different command executes in the terminal.
Why that’s dangerous:
- Trust break: a user may inadvertently authorize a destructive command (for example, deleting files) while believing they are gating a harmless one (for example, a status check).
- Dependency failure: logically linked steps (build → test) get treated as independent; running tests before a build completes causes flakiness and can “poison” results.
Mitigation & next steps
Until a patch lands, be cautious when the model proposes multi-step actions or when you see more than one approval prompt in quick succession.
What users can do now
- Slow down approvals: approve one prompt, then wait to confirm the expected command started before approving the next.
- Watch the terminal, not just the UI: treat the UI prompt as untrusted unless the executed command matches.
- Avoid risky commands in agent mode: don’t run “cleanup” or destructive operations through the assistant until the binding is fixed.
Suggested fixes for maintainers
- Strict serialization: enforce a global execution lock while any manual approval is pending.
- Hard binding: approvals must be tied to an immutable tool-call ID, and the terminal execution must reference the same ID.
- UI transparency: show all pending tool calls in the queue (with IDs) so users understand the true execution state.
- Dependency awareness: allow the model (or the harness) to declare prerequisites so “build → test” cannot parallelize incorrectly.
Reference issue
GitHub: [cli security] Different command is run compared to what is approved #11112
FAQ
Is this a vulnerability or “just” a bug?
If approvals can be misapplied to the wrong command, it’s an authorization integrity problem. Even if it’s triggered by races (not attackers), it creates a path for accidental or coerced execution.
What’s the safest workflow until it’s fixed?
Keep agent sessions read-only where possible, avoid batch tool execution, and treat approvals like you would a privileged action: verify the exact command that starts running before proceeding.
Leave a Reply