Codex CLI 0.98.0 Approval Swap security Bug: Parallel Tool Calls, Race Conditions, and TOCTOU Risk

The 2026 Feb release of Codex CLI 0.98.0 and the integration of gpt-5.3-codex introduce a major speedup: parallel tool calling. But there’s a hidden cost that matters for anyone who relies on “approve/deny” prompts: a tool-call race condition that can desynchronize approvals from the commands that actually execute.

If you’re a security researcher or a developer using the CLI, this isn’t just a minor UX bug. It’s an authorization integrity failure: the UI says you approved one thing, but a different action runs.

TL;DR: treat this like a TOCTOU (time-of-check vs time-of-use) problem. Until it’s fixed, avoid approving multi-step tool sequences without carefully watching the actual terminal output and queued prompts.

The issue: approval swap & desynchronization

Users report cases where the CLI asks for approval on one command (for example, a build), but executes a different queued command (for example, a test suite). Then, a second approval prompt appears later—and the two approvals effectively “swap” the commands they gate.

Environment (as reported)

Codex CLI: 0.98.0
Model: gpt-5.3-codex
Platform: Linux 6.8.0-88-generic (Terminator)

Observed behavior (example)

The prompt: UI asks: “Approve running make build?”
The action: user clicks Yes.
The result: terminal shows pytest running instead.
The rebound: moments later, a second prompt for pytest appears, but make build executes upon its approval.

Technical root cause: parallel execution loops

The key change is that newer models can emit multiple tool calls in a single response. If the CLI harness decides two tool calls are “independent,” it may attempt to process them concurrently.

That concurrency is where a race can happen:

One tool call requires a manual approval (a UI gate).
Another tool call is auto-approved (or simply starts faster).
When the queue and UI prompts aren’t bound to specific tool-call IDs, execution order can flip—and the approval dialog no longer corresponds to the command that runs.

Even if nothing malicious is happening, concurrency plus weak binding between “what you approved” and “what runs” can produce an approval swap.

Security risk: TOCTOU and authorization confusion

In security terms, this resembles a TOCTOU vulnerability:

Time of check: the user approves a specific command in the UI.
Time of use: a different command executes in the terminal.

Why that’s dangerous:

Trust break: a user may inadvertently authorize a destructive command (for example, deleting files) while believing they are gating a harmless one (for example, a status check).
Dependency failure: logically linked steps (build → test) get treated as independent; running tests before a build completes causes flakiness and can “poison” results.

Mitigation & next steps

Until a patch lands, be cautious when the model proposes multi-step actions or when you see more than one approval prompt in quick succession.

What users can do now

Slow down approvals: approve one prompt, then wait to confirm the expected command started before approving the next.
Watch the terminal, not just the UI: treat the UI prompt as untrusted unless the executed command matches.
Avoid risky commands in agent mode: don’t run “cleanup” or destructive operations through the assistant until the binding is fixed.

Suggested fixes for maintainers

Strict serialization: enforce a global execution lock while any manual approval is pending.
Hard binding: approvals must be tied to an immutable tool-call ID, and the terminal execution must reference the same ID.
UI transparency: show all pending tool calls in the queue (with IDs) so users understand the true execution state.
Dependency awareness: allow the model (or the harness) to declare prerequisites so “build → test” cannot parallelize incorrectly.

Reference issue

GitHub: [cli security] Different command is run compared to what is approved #11112

FAQ

Is this a vulnerability or “just” a bug?

If approvals can be misapplied to the wrong command, it’s an authorization integrity problem. Even if it’s triggered by races (not attackers), it creates a path for accidental or coerced execution.

What’s the safest workflow until it’s fixed?

Keep agent sessions read-only where possible, avoid batch tool execution, and treat approvals like you would a privileged action: verify the exact command that starts running before proceeding.

codefix.dev