What Is the Code Execution Tool?
The Code Execution Tool is an official Anthropic Claude API tool that allows Claude to execute Python code inside a managed, isolated sandbox container and incorporate the output into its next response. Identified by the type code_execution_20250825, the tool went GA in August 2025 and remains a core capability for agentic Claude applications throughout 2026.
The mental model is simple: instead of giving Claude a calculator, you give it a real Jupyter-like environment. When a user uploads a CSV and asks for monthly aggregates, Claude writes pandas code, executes it, then summarizes the result in natural language — all inside a single API call. Without the Code Execution Tool, the developer would have to capture model output, run it externally, and feed results back into the next prompt manually.
How to Pronounce Code Execution Tool
kohd ek-suh-KYOO-shun tool (/koʊd ˌɛk.səˈkju.ʃən tuːl/)
code exec tool (/koʊd ˈɛk.zɛk tuːl/) — common shorthand among developers
How the Code Execution Tool Works
Internally, the Code Execution Tool runs in four stages. Understanding the flow makes debugging dramatically easier when something goes wrong in production.
Code Execution Tool: Lifecycle
The sandbox is an isolated container managed by Anthropic. Network egress is restricted, the filesystem is ephemeral, and the runtime ships with NumPy, pandas, matplotlib, SciPy, and other common scientific libraries pre-installed. It is important to remember that this is not a Colab-style environment where you can pip install anything you want; package availability is intentionally curated.
Sandbox Isolation Model
Per Anthropic’s documentation, a Code Execution container is scoped to a request or session. Files and variables persist across messages within the same conversation, but they do not survive between separate conversations. To persist data across runs, developers must explicitly save and reference files using the Files API. A common production pitfall is assuming long-lived state — keep in mind that the container is volatile by design.
Code Execution Tool Usage and Examples
Quick Start
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
tools=[{
"type": "code_execution_20250825",
"name": "code_execution"
}],
messages=[{
"role": "user",
"content": "Count the prime numbers between 1 and 100."
}]
)
print(response.content)
A few lines is all it takes to get a Python-capable Claude. In production, you should pair Code Execution with other tools (web search, Bash Tool) to compose richer agents. Note that Claude itself decides whether to use the tool; not every prompt triggers code execution.
Common Implementation Patterns
Pattern A: Data analytics agent
import anthropic
client = anthropic.Anthropic()
file_id = "file_abc123" # uploaded via Files API earlier
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=8192,
tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
extra_headers={"anthropic-beta": "files-api-2025-04-14"},
messages=[{
"role": "user",
"content": [
{"type": "container_upload", "file_id": file_id},
{"type": "text", "text": "Group monthly sales and chart the top 5 products."}
]
}]
)
Best for: numeric analysis, charting, and data cleaning where verifying the math matters more than raw speed.
Avoid when: the workload requires hammering many external APIs — sandbox network egress is restricted.
Pattern B: Math and verification agent
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
messages=[{
"role": "user",
"content": "Solve the quadratic 2x^2 - 7x + 3 = 0 and verify the solution."
}]
)
Best for: math problems, statistics, simulation, regex testing — domains where hallucination is unacceptable.
Avoid when: trivial arithmetic the model can do faster on its own.
Anti-pattern: Sending secrets into the sandbox
# Anti-pattern — never do this
messages=[{"role": "user", "content": "Set sk_live_xxx as an env var and run"}]
Inputs to Code Execution may be logged for safety reasons. Never paste API keys, OAuth tokens, or PII into the prompt. Instead, keep secrets on the client and pass only the results that the model needs to reason about.
Advantages and Disadvantages of the Code Execution Tool
Advantages
- Closes the “think → verify” loop automatically, reducing hallucinations on numeric tasks.
- The sandbox is hosted and patched by Anthropic — no Docker/Seccomp engineering required.
- Common scientific Python (pandas, NumPy, matplotlib, SciPy) is pre-installed.
- Pairs cleanly with the Files API for spreadsheet, CSV, and PDF processing.
Disadvantages
- Container time adds cost on top of token usage. Check the latest pricing page.
- Outbound network access is restricted — you cannot reach private corporate APIs.
- You cannot freely
pip installarbitrary packages; only the curated list is available. - State is ephemeral. Persistence requires explicit Files API workflows.
Code Execution Tool vs Bash Tool
The Code Execution Tool and the Bash Tool are both “let Claude run code” capabilities, but they target very different design points. The comparison below maps the differences across five practical axes.
| Aspect | Code Execution Tool | Bash Tool | Computer Use |
|---|---|---|---|
| Primary role | Run Python in a managed sandbox | Run shell commands | Drive a real desktop GUI |
| Sandbox ownership | Anthropic-hosted | Developer provides | Developer provides VM |
| Network | Restricted | Whatever you allow | Full access |
| Pre-installed libs | pandas, NumPy, matplotlib, SciPy | Whatever you bake into the image | Whatever the OS has installed |
| Best fit | Numeric analysis, charting, verification | File ops, multi-language scripts | Replacing manual GUI work |
| Setup effort | Low — flip a flag | Medium — ship a container | High — virtual desktop infra |
In short: pick the Code Execution Tool when you need numeric correctness and quick analytics, and reach for the Bash Tool or Computer Use when the task is broader — local files, repos, or full desktop applications.
Common Misconceptions
Misconception 1: “The Code Execution Tool is the same as Claude Code.”
Why this confusion arises: both names contain “code” and Anthropic blog posts often describe both as “letting Claude write and run code.” Readers get confused because the marketing surface overlaps and the reasoning that “code is code” stems from skimming docs.
What’s actually true: Claude Code is a CLI and Agent SDK that operates on a developer’s local machine and repositories. The Code Execution Tool, by contrast, runs Python inside an Anthropic-managed sandbox and is invoked through the API. Different runtime, different scope, different threat model.
Misconception 2: “You can pip-install anything you want inside the sandbox.”
Why this confusion arises: developers get misled because they draw a mental analogy with Google Colab where free-form installation is the norm. The confused mental model stems from early docs that mentioned “pre-installed libraries” without clarifying the list was exhaustive — that wording is the reason for many onboarding mistakes.
What’s actually true: the supported package set is curated by Anthropic. Libraries beyond that list cannot be installed at runtime. When unusual dependencies are required, the recommended pattern is to ship the data via the Files API or to switch to a developer-managed Bash Tool sandbox.
Misconception 3: “Code execution is free because it’s part of the model.”
Why this confusion arises: ChatGPT’s Code Interpreter is bundled into Plus subscriptions, and the reason readers expect identical pricing is that they conflate consumer subscriptions with API tooling. People get confused because the “tool fee” line item is buried in detailed pricing tables.
What’s actually true: Code Execution incurs container runtime charges in addition to token usage. Always verify the latest numbers on Anthropic’s pricing page and budget for sandbox time when estimating production cost.
Real-World Use Cases
Lightweight BI assistant
Upload a CSV, ask “show me month-over-month deltas,” and get a chart back without standing up a BI stack. Effective for ad-hoc reviews when the team has not yet adopted Looker or Tableau.
ETL regex prototyping
Ask Claude for a regex that parses a log line, then have it generate 100 sample lines and validate the regex against them. Cuts iteration time on tricky string parsing significantly.
Financial and statistical verification
NPV, IRR, t-tests, chi-square checks — domains where a small numerical error has real cost. Forcing the model to run actual code dramatically lowers the chance of confidently wrong answers.
Frequently Asked Questions (FAQ)
Q1. Is the Code Execution Tool the same as Claude.ai Artifacts?
No. Artifacts is a UI feature that displays generated code or documents and supports limited interactive runs. The Code Execution Tool is an API-level tool that actually executes Python and feeds results back into Claude’s reasoning.
Q2. Which languages are supported?
As of May 2026, the official runtime is Python only. You can chain the Bash Tool to invoke other interpreters, but the curated runtime is documented for Python. Always check the latest documentation for any expansions.
Q3. What are the file upload limits?
File limits follow the Files API specification. Refer to the official documentation for the most recent size and format constraints. For very large files, chunking remains the practical pattern in production.
Q4. How fast is the tool?
There is a sandbox cold-start overhead of several hundred milliseconds to a few seconds. For latency-sensitive prompts, design the prompt so the model only invokes Code Execution when truly necessary.
Production Deployment Considerations
Once a Code Execution prototype works, the next question is how to operate it safely and predictably. Below are the practical considerations that production teams report mattering most. You should treat this as a checklist, not a rulebook — not every project needs every item.
Cost modeling and budgeting
It is important to remember that Code Execution charges in two dimensions. You pay for the model tokens that flow through the API and you pay for sandbox container time. Many teams underestimate the second axis because container startup and code-running latency does not appear directly in the prompt. Keep in mind that a single complex analysis can chain several executions; build a budget per request that includes both costs, not just tokens.
A useful rule of thumb is to monitor “cost per resolved task” rather than “cost per API call.” Note that downstream value, not raw call volume, is what justifies the spend. Teams that lean only on token-cost dashboards often miss the silent overhead of repeated short-lived containers.
Sandbox state hygiene
Because the sandbox is ephemeral, you should treat anything written inside it as transient. Do not assume your CSV is still there an hour later. Files that need to outlive a session must round-trip through the Files API. Note that this also means there is no traditional cron-style continuity — if a workflow needs daily artifacts, schedule the orchestration outside Anthropic and re-upload the inputs on each run.
One pattern that production teams favor is “input bundle in, output bundle out”: every Code Execution call gets a clearly bounded set of input files and produces a clearly bounded set of output artifacts. That keeps audit trails simple and avoids the trap of relying on hidden in-container state.
Failure modes and retries
Code Execution can fail at three distinct layers and you should design your retry policy accordingly. The model itself can produce buggy code. The sandbox can time out or hit a memory limit. The Files API can return a transient network error. Each layer deserves a different response: model errors are best handled by feeding the traceback back to Claude for repair, sandbox limits should bump the retry count and reduce the workload size, and Files API errors deserve idempotent retries with backoff.
Observability and audit
For regulated workloads, you should capture three things: the exact prompt that triggered code execution, the code Claude produced, and the verbatim sandbox output. Anthropic does not retain this for you — if your security or compliance posture requires reproducibility, plan to log it on your side. Keep in mind that logs may contain user-supplied data, so the same access controls you apply to ordinary application logs apply here.
Network egress strategies
Sandbox network egress is intentionally restricted. The right design pattern is to do “data fetching” outside the sandbox and pass already-fetched payloads in via Files API. If the workload demands richer connectivity, switch to the Bash Tool with a developer-managed container, or graduate to Computer Use. It is important to remember that this is the canonical “use the right tool for the layer” tradeoff that recurs throughout Anthropic’s tooling.
Selecting the right model tier
Code Execution works with all current Claude models, but the model you pair it with strongly affects results. Note that Opus models are best for novel analytics and multi-step reasoning, Sonnet is a strong default for production throughput, and Haiku makes sense when the task is well-bounded — a known transformation across many similar inputs. You should benchmark all three on your representative workload before committing.
Agent design patterns that pair well
The Code Execution Tool is rarely used alone in production. The most common companion is a web-search tool for grounding, plus Files API for data movement. Some teams add a “verifier” subagent that reruns the same calculation in a different way and flags discrepancies. Note that this verifier pattern is especially valuable in financial and scientific workflows, where a single wrong number can be expensive.
Example: nightly KPI digest pipeline
A typical production deployment is a “nightly KPI digest.” Inputs (raw event data) are uploaded daily via the Files API. The Code Execution Tool produces a single rolled-up dashboard with charts, written to disk inside the sandbox. The output bundle is downloaded and emailed to stakeholders. Total cost is dominated by sandbox time rather than tokens, and the entire pipeline runs unattended. You should profile the run regularly because input size tends to grow over time, which can quietly push the workflow into a slower bucket.
Comparison with Adjacent Tools and Future Outlook
The Code Execution Tool sits in a fast-evolving family of capabilities. To put it in context, you should think of it as one slice of a wider stack that includes the Bash Tool, Computer Use, Files API, web search, and the developer-facing Claude Code CLI. Note that each capability solves a slightly different scoping problem, and the boundaries are deliberate rather than accidental.
Where Code Execution stops and other tools start
Code Execution stops at the boundary of “self-contained Python computation.” When the workload needs to fan out to many external HTTP endpoints, integrate with internal SaaS, or drive an arbitrary GUI, you should reach for adjacent tools. Keep in mind that the cleanest production agents are the ones that compose tools rather than overloading any single one.
For example, a research agent might combine web search to find sources, the Code Execution Tool to run statistical analysis on the harvested data, and Files API to deliver the rolled-up report. Each tool stays inside its lane, and the orchestration logic stays in Claude rather than in custom Python on the developer’s side.
Trends shaping the next 12 months
Three trends are visible in the agent-tooling space. First, Anthropic and other vendors are extending sandbox tools with richer pre-installed libraries — expect Code Execution to gain more domain-specific packages over time. Second, programmatic tool calling (where Claude writes a single block of Python that orchestrates many tool calls) is becoming a first-class pattern, replacing the older request-per-call style. Third, audit and compliance features are catching up with the speed of feature development. Note that for regulated industries, this third trend matters more than raw capability.
How to evaluate Code Execution against alternatives
If you are choosing between hosted code execution and a self-hosted sandbox, you should weigh four axes: setup time, ongoing maintenance, network access requirements, and audit needs. Code Execution wins on the first two; self-hosted Bash Tool wins on the third. Keep in mind that audit needs are tier-dependent — if you only need request-level logs, hosted suffices. If you need full filesystem snapshots, self-hosted may be required.
The “verify with code” pattern as default
One pattern that is solidifying as a best practice is “the model writes; the model also verifies.” For numeric and code-generation tasks, having Claude run a verification pass via the Code Execution Tool catches a meaningful fraction of subtle errors that pure model self-review misses. You should keep this pattern in your back pocket whenever the cost of a wrong answer is significant. Note that the cost of the verification pass is small relative to the cost of a downstream user encountering a bad answer.
What to expect from the next major Claude model
Each new Claude generation has historically expanded what Code Execution feels capable of. The model is the bottleneck, not the sandbox. You should plan to re-evaluate your Code Execution workflows after each major model release because what was awkward last quarter may become trivial in the next. Keep in mind that benchmarks of “what Code Execution can do” should be re-run on a quarterly cadence at minimum.
Closing thoughts on adoption strategy
Teams that get the most out of the Code Execution Tool tend to follow a simple adoption arc: pilot with a single high-value workflow, build the verification habit, then expand horizontally to adjacent workflows. Note that the temptation to “rewrite everything as agents” is rarely the right move. The Code Execution Tool excels in the specific cases where deterministic Python plus probabilistic Claude reasoning are jointly more valuable than either alone.
Conclusion
- The Code Execution Tool is Anthropic’s official Claude API tool that runs Python in a managed sandbox.
- It is identified by the type
code_execution_20250825, GA since August 2025. - The sandbox is isolated, network-restricted, and ephemeral by design.
- Common scientific libraries are pre-installed, and Files API integration enables CSV, Excel, and PDF processing.
- It complements the Bash Tool and Computer Use rather than replacing them — agents often combine all three.
- Strongest fits: numeric verification, ad-hoc analytics, and regex or simulation prototyping.
References
📚 References
- ・Anthropic, “Code execution tool — Claude API Docs” https://platform.claude.com/docs/en/agents-and-tools/tool-use/code-execution-tool
- ・Anthropic, “Introducing advanced tool use on the Claude Developer Platform” https://www.anthropic.com/engineering/advanced-tool-use
- ・Anthropic, “Programmatic tool calling — Claude API Docs” https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling
- ・Anthropic, “Claude API Documentation” https://platform.claude.com/docs/en/home






































Leave a Reply