What Is Codex CLI? A Complete Guide to OpenAI’s Open-Source Terminal Coding Agent, Its 2026 Workflow, MCP Support, Multi-Agent Mode, and How It Compares to Claude Code – IT Glossary Plus

What Is Codex CLI?

Codex CLI is OpenAI’s open-source coding agent that runs in your terminal. Built in Rust for low memory footprint and fast startup, the CLI is granted permission to read your project’s files, modify them, and execute shell commands on your local machine. It pairs with OpenAI’s frontier models (GPT-5.5 is the recommended default in 2026) to autonomously execute multi-step development workflows from a single natural-language instruction.

A useful mental model: Codex CLI is the terminal-resident equivalent of a junior pair programmer. You can say “fix this failing test,” “add a Docker setup,” or “refactor this module to use async/await,” and the agent will read the relevant files, plan the change, write code, and run the test suite to verify. Important to remember that Codex CLI sits squarely in the same product category as Anthropic’s Claude Code and Cursor’s Composer feature — collectively the “terminal-native AI coding agents” wave that has reshaped how engineers spend their days.

How to Pronounce Codex CLI

Codex CLI (/ˈkoʊ.dɛks siː.ɛlˈaɪ/)

Codex (short form when context is clear)

How Codex CLI Works

Codex CLI shipped its first stable release in 2025 and has gone through several major versions by mid-2026. The core architecture splits cleanly into two layers: a local agent runtime written in Rust that handles file I/O, shell execution, and tool orchestration, and a cloud model layer hosted by OpenAI that performs the actual reasoning and code generation. Note that this division is the source of both the agent’s local responsiveness and its dependency on network connectivity.

Component breakdown

Codex CLI architecture

1. Interactive TUI (Rust)

→

2. Agent runtime & planner

→

3. Tool layer (file/shell/MCP)

→

4. OpenAI API (GPT-5.5)

The 2026 release added significant multi-agent capabilities. Sub-agents are now addressed using readable path-style identifiers like /root/agent_a, with structured inter-agent messaging, agent listing, and persisted /goal workflows that survive across sessions. This makes long-running tasks distributable across multiple specialized sub-agents, which is important for keeping context windows manageable on multi-hour engineering work.

Key Subcommands

Command	Purpose
`codex`	Launch interactive TUI session
`codex exec` (alias `codex e`)	Run non-interactively from a script
`codex exec --json`	Stream JSONL with reasoning-token usage
`/model`	Switch models (gpt-5.5, gpt-5.4, gpt-5.3-codex)
`/review`	Run a dedicated reviewer over a diff
`/goal`	Create or resume a persisted goal
`codex mcp`	Manage MCP servers (list/add/remove/auth)

Codex CLI Usage and Examples

Quick Start

# Install via Homebrew
brew install --cask codex

# Set your API key
export OPENAI_API_KEY="sk-..."

# Launch in your project directory
cd ~/myproject
codex

# Then chat in natural language:
> rewrite README.md based on the actual project
> fix the failing tests
> parallelize this CSV processing script

Non-Interactive Mode for CI

# Useful in GitHub Actions or any CI pipeline
codex exec --json "fix all lint warnings" > result.jsonl
# JSONL contains reasoning tokens, runtime, list of changed files

Common Implementation Patterns

Pattern A: Interactive local development

cd ~/work/api-server
codex
> write tests for this file
> /review
> /model gpt-5.5
> refactor it

Good fit: Local exploratory work where you iterate based on the agent’s response. The feedback loop is natural and you can correct the agent quickly.

Bad fit: Periodic background jobs. The interactive TUI is built for human attention, not unattended scheduling.

Pattern B: CI auto-fix pipeline

# .github/workflows/auto-fix.yml
- name: Auto-fix lint issues
  run: codex exec --json --auto-approve "fix all lint errors" > fixes.jsonl
- name: Open pull request
  run: gh pr create --title "Auto-fix lint issues"

Good fit: Mechanical fixes (lint, formatting, dependency bumps) that you want shipped as a PR for human review.

Bad fit: Anything that involves architectural judgment. Important to keep a human reviewer in the loop for non-trivial changes.

Anti-pattern: Granting production database credentials

# DO NOT DO THIS
export DATABASE_URL="postgres://prod-master/..."
codex
> delete invalid records from the users table

An LLM agent operates probabilistically. Giving it production database credentials means an unrecoverable DROP or DELETE is one bad inference away. Important: always run agents against staging or read-only replicas, and require explicit approval for any destructive command.

Advantages and Disadvantages of Codex CLI

Advantages

Open source under Apache 2.0. The CLI binary’s source is public, making integration into corporate CI pipelines and forks straightforward.
Lightweight Rust implementation. Startup is near-instantaneous and memory consumption is small — much smaller footprint than Electron-based IDEs.
Multi-agent orchestration. The 2026 release introduced path-addressed sub-agents and persisted goals, making it suitable for very long-running tasks.
MCP server integration. Adopting Anthropic’s Model Context Protocol means cross-vendor tool ecosystem reuse is now possible.
Direct access to OpenAI’s frontier models, including GPT-5.5 with adjustable reasoning levels and image generation.

Disadvantages

Cloud dependency. Despite running locally, all reasoning happens in OpenAI’s data centers. There is no offline mode.
Token-based billing. Heavy daily use can run into the hundreds of dollars per month for an active engineer.
Younger ecosystem than Claude Code. Third-party plugins and templates are still maturing.
Shell-execution risk. Auto-approve mode can run destructive commands. Important to configure permissions carefully.

Codex CLI vs Claude Code

Both tools are “terminal-native AI coding agents,” but the vendor, model, and ecosystem differ. The table below clarifies the practical differences a team should consider when picking one.

Aspect	Codex CLI	Claude Code
Vendor	OpenAI	Anthropic
Primary models	GPT-5.5 / GPT-5.4	Claude Opus 4.6 / Sonnet 4.6
Implementation	Rust	TypeScript on Node.js
License	Apache 2.0 (full OSS)	Proprietary (selective OSS)
Extension model	MCP servers	Skills + Plugins + MCP
Sub-agents	Path-style addressed (multi-level)	Task tool (flat)
Image generation	Built into CLI	Not built-in
Best fit	OpenAI ecosystem users wanting a lightweight CLI	Anthropic ecosystem users leaning on Skills

Important to remember: these are competing products that solve the same problem with different design choices. Because both adopt MCP for tool integration, an MCP server you build can be wired into both — that compatibility wedge is one of the most consequential industry developments in the past year.

Common Misconceptions

Misconception 1: “Codex CLI is the successor to GitHub Copilot.”

Why people get confused: OpenAI’s original 2021 Codex model powered the first version of GitHub Copilot. The shared “Codex” brand name across years stems from product-line continuity at OpenAI, but the surface-level name match leads many to assume it’s the same product evolving.

The reality: GitHub Copilot is an editor-integrated completion tool produced by Microsoft/GitHub, while Codex CLI is an OpenAI-owned terminal agent. Different organizations, different architectures, different categories.

Misconception 2: “Codex CLI runs locally, so it works offline.”

Why people get confused: The phrase “runs locally” combined with the fact that the binary is shipped as a native Rust executable confuses many readers into expecting offline capability. The historical confusion between “native binary” and “no internet required” is the underlying reason.

The reality: Only the agent runtime is local. Every model inference is a network call to OpenAI’s API. Without an internet connection the agent cannot generate a single token of output.

Misconception 3: “It’s safe to give Codex CLI production credentials.”

Why people get confused: Trust in the OpenAI brand, plus the agent’s competent-seeming behavior on simple tasks, leads users to overestimate reliability. The cognitive bias of “this looks smart, so it must be careful” is the reason for this dangerous misconception.

The reality: An LLM agent is probabilistic. A confident-sounding plan can include a destructive command. Production credentials should never reach an agent’s environment without strict approval gating, and ideally never at all — staging is the right environment for agent-driven changes.

Real-World Use Cases

Local development repetition reduction

Adding tests, refactoring, updating documentation, and tidying dependencies are mechanical-but-judgment-required tasks where Codex CLI shines. Important to remember that engineers report 30–50% reductions in time-on-task for these kinds of work, which compounds significantly across an organization.

Legacy code understanding and migration

Porting old Python codebases to TypeScript, decoding cryptic SQL stored procedures into modern equivalents, or untangling multi-decade-old shell scripts. Note that the agent’s read-then-explain step is often more valuable than its write step, because it produces documentation as a byproduct.

CI/CD automation pipelines

Using codex exec, you can wire the agent into GitHub Actions or any CI runner to perform routine fixes (lint, formatting, dependency bumps), generate README updates, or post automated review comments on pull requests. Important to gate these flows with explicit approval steps for non-trivial changes.

Learning and education

When picking up a new framework or library, asking Codex CLI for a minimal working example and then asking it to explain the code line-by-line is a fast learning loop. Note that the human risk here is intellectual passivity — the value is in the explanation dialog, not the generated code.

Cross-language code review

Engineers fluent in one stack often need to review changes in another. The agent is excellent at translating across languages, summarizing the intent of a diff, and flagging idiomatic concerns. Important to remember the reviewer is still the human; the agent is a translator.

Documentation and runbook generation

Pointing the agent at a codebase and asking for an architectural summary, an onboarding doc, or an incident runbook produces useful first drafts that humans then refine. Note that the resulting documents need editorial review, but starting from a draft is much faster than starting from a blank page.

Operational Best Practices

Running Codex CLI at organizational scale requires more than just installing the binary. The following practices are what experienced engineering teams converge on after their first few months of agent-driven development. Important to remember that AI coding agents are powerful tools, but the operational discipline around them matters as much as the agent’s raw capability — perhaps even more, because the failure modes are often silent and only surface in production.

Sandboxed execution by default

Run Codex CLI inside a container or virtual machine whenever possible, especially when granting auto-approve permissions. Important to ensure that even if the agent generates a destructive command, the blast radius is contained. Note that many teams use a dedicated dev-container with read-only mounts for source-of-truth data and writable scratch directories for the agent to operate in. This pattern is cheap to set up and prevents almost every catastrophic agent error.

Approval gates for destructive operations

Configure the CLI to require explicit human approval for commands that delete files, drop database tables, push to remote repositories, or modify infrastructure. Note that the small friction of approving each operation is worth the safety guarantee, especially in the first few weeks of using the agent on a new codebase. You should also keep an audit log of approved actions for postmortem analysis when something does go wrong.

Cost monitoring and budgets

Token usage can spiral quickly with multi-step agent workflows. Important to set up daily and monthly budget alerts in the OpenAI dashboard, and establish per-developer or per-team caps. Note that the most cost-effective pattern is to use cheaper models (gpt-5.4 or gpt-5.3-codex) for routine tasks and reserve gpt-5.5 for complex reasoning. The /model switch command makes this easy to do mid-conversation.

Version control discipline

Always commit before invoking the agent on a new task. Important so that you can quickly revert if the agent’s changes are wrong or destructive. Note that a clean working tree before each agent invocation is the equivalent of a save point in a video game — it lets you experiment fearlessly. Conversely, agents working on already-modified files can accidentally entangle their changes with yours, making cleanup painful.

Prompt and context discipline

Be specific in your instructions. “Fix the bug” is much weaker than “Fix the off-by-one error in src/parser.ts:42 that causes the last row to be dropped from CSV exports.” Important to include the file path, the symptom, the expected behavior, and any relevant context. Note that this is also the practice that pays the most dividends — engineers who develop strong prompt-writing habits get materially better results from any AI coding agent, not just Codex CLI.

Test before trusting

Always run the existing test suite after the agent finishes a change. Important to verify that the agent did not silently break something elsewhere. Note that the most common failure mode is not “the agent broke the obvious thing” — it is “the agent broke a related thing two files away that you forgot to look at.” A fast test suite is your first line of defense, and CI is the second. Many teams configure their agents to run tests automatically before declaring a task complete, which catches the majority of these silent regressions.

Document agent-driven changes

When the agent makes a non-trivial change, write a short note in the commit message explaining what you asked it to do and why. Important so that future engineers (including you) understand the reasoning behind the diff. Note that AI-generated code often looks subtly different from human-written code in style, so future maintainers benefit from knowing the provenance. This becomes especially important during code review when the reviewer may want to ask questions about choices the agent made on its own.

Prefer focused tasks over open-ended ones

The agent performs much better on well-scoped tasks (“add a unit test for the parse_csv function covering empty input”) than on vague ones (“improve the codebase”). Important to remember that scope-creep is the enemy of agent reliability. Note that breaking a large goal into a sequence of small, verifiable agent invocations almost always produces higher-quality results than asking for the entire goal in a single prompt.

Pair with version control workflows

Treat the agent as another contributor to your version control workflow. Important to use feature branches, descriptive commits, and PR-based reviews even for agent-driven changes. Note that the agent makes mistakes that look very different from human mistakes — sometimes confidently wrong code, sometimes overly defensive code, sometimes weirdly verbose comments. A reviewer who is paying attention catches these. A reviewer who skims because “the AI did it” does not. Important to maintain the same review rigor regardless of whether code came from an engineer or an agent.

Build organizational knowledge about agent strengths

Different agents have different strengths. GPT-5.5 in Codex CLI is particularly strong at TypeScript and Rust as of mid-2026, and somewhat weaker at niche language ecosystems. Important to maintain an internal wiki of “what the agent does well” and “where it struggles” so the team can route work appropriately. Note that this knowledge changes with each model release, so the wiki needs maintenance. The result is that the agent’s failure modes become predictable and manageable rather than surprising and damaging.

Frequently Asked Questions (FAQ)

Q1. Is Codex CLI free to use?

The CLI binary itself is open-source under Apache 2.0 and free to install. However the OpenAI API calls it makes are billed by token usage, so in practice it is a paid service. ChatGPT Plus and Pro subscribers can link their account and share the same usage allowance.

Q2. How is Codex CLI different from ChatGPT Canvas?

Canvas is an in-browser editor surface for code, while Codex CLI is an agent that runs on your local terminal. The decisive difference is that Codex CLI can read and write files on your machine and execute shell commands directly, which Canvas cannot.

Q3. Which model should I use?

OpenAI recommends gpt-5.5 for complex coding tasks as of 2026. Inside the TUI you can switch between gpt-5.5, gpt-5.4, gpt-5.3-codex, and adjust reasoning levels using the /model command.

Q4. Can Codex CLI coexist with Claude Code or Cursor?

Yes, technically. Codex CLI is from OpenAI, Claude Code is from Anthropic, and Cursor is an editor IDE — each occupies a different layer of the stack. Many engineering teams report using two or three of them on the same codebase.

Q5. Does Codex CLI support MCP servers?

Yes. The 2026 release ships with built-in management for Model Context Protocol servers — listing, adding, removing, and authenticating them through the codex mcp subcommand. OpenAI adopted Anthropic’s open MCP standard, making cross-vendor tool reuse possible.

Conclusion

Codex CLI is OpenAI’s open-source, Rust-based terminal-native AI coding agent.
It pairs frontier models (GPT-5.5 recommended in 2026) with local file I/O, shell execution, and MCP tool calls.
The 2026 release adds multi-agent orchestration, persisted /goal workflows, and image generation.
It is the direct competitor to Anthropic’s Claude Code and shares MCP compatibility, easing cross-vendor tooling.
Never grant production credentials. Use staging or read-only environments for any destructive workflows.
Best fits: local repetitive work, legacy code migration, CI auto-fix pipelines, and learning workflows.