What Is Extended Thinking?
Extended Thinking is a reasoning enhancement feature developed by Anthropic for their Claude AI models. When enabled, it gives Claude access to an internal “thinking block” — a dedicated scratchpad where the model can reason step-by-step before delivering its final answer. This fundamentally changes how the AI approaches complex problems. Instead of generating an immediate response, the model first works through the problem methodically, breaking it into components, testing hypotheses, verifying intermediate results, and synthesizing its findings into a coherent answer. This is an important point to keep in mind when working with advanced AI systems.
The feature was first introduced with Claude 3.7 Sonnet in early 2025, marking a significant advancement in how large language models handle tasks that require deep reasoning. Extended Thinking employs what researchers call “serial test-time compute” — a technique that allows the model to perform multiple sequential reasoning steps during inference. Unlike parallel processing approaches, each step builds upon the conclusions of the previous one, maintaining logical consistency throughout the reasoning chain. Research has demonstrated that accuracy on mathematical problems improves logarithmically with the number of thinking tokens allocated, meaning that while more thinking generally leads to better results, the marginal improvement decreases as thinking time increases. You should note this characteristic when designing your applications.
One of the most compelling aspects of Extended Thinking is its transparency. The thinking block makes the model’s raw reasoning process visible to users and developers. In traditional AI interactions, the reasoning behind an answer is a black box — you get the output but have no insight into how the model arrived at that conclusion. Extended Thinking changes this paradigm entirely. By exposing the complete chain of thought, it enables verification, debugging, and trust-building in ways that were previously impossible. This transparency is particularly valuable in professional and enterprise contexts where accountability and explainability are critical requirements.
With the release of Claude Opus 4.6 and Sonnet 4.6, Extended Thinking has evolved into what Anthropic calls “Adaptive Thinking.” This enhanced version dynamically determines when and how much thinking is needed based on the complexity of the input. Simple questions receive brief thinking, while complex problems trigger extensive reasoning. Interleaved thinking is also enabled automatically, allowing the model to pause mid-response to think when it encounters a challenging sub-problem. Additionally, thinking content is now summarized by default on these newer models, providing concise overviews of lengthy reasoning processes.
How to Pronounce Extended Thinking
Extended Thinking (ik-STEN-did THING-king)
The term is a straightforward English compound. “Extended” means prolonged, expanded, or stretched beyond normal limits, while “Thinking” refers to the cognitive process of reasoning and deliberation. In the context of AI, “Extended Thinking” specifically refers to the allocated reasoning time and space that goes beyond standard token generation. The term is used consistently across Anthropic’s documentation and is not typically abbreviated. In Japanese technical contexts, it is transliterated as “エクステンデッド シンキング” (ekusutendeddo shinkingu).
How Extended Thinking Works
Understanding the internal mechanics of Extended Thinking is essential for using it effectively. The process can be broken down into three major phases: input reception, thinking block execution, and final answer generation. The following diagram illustrates this flow.
The Serial Test-Time Compute Architecture
At the core of Extended Thinking lies the concept of serial test-time compute. Unlike approaches that run multiple inference paths in parallel and select the best result, Extended Thinking operates sequentially. Each reasoning step is processed one after another, with each step having access to the results of all previous steps. This serial approach is important because it maintains logical coherence — later conclusions are built directly upon earlier verified reasoning, much like how a human mathematician works through a proof step by step.
The serial nature of this computation also means that the model can self-correct during the thinking process. If a reasoning step leads to a contradiction or an implausible result, the model can recognize this error, backtrack, and try an alternative approach. This self-correction capability is one of the key advantages of Extended Thinking over simpler prompting techniques. Note that this architectural choice involves a deliberate tradeoff: serial processing takes more time than parallel approaches, but produces more logically consistent results for complex reasoning tasks.
Inside the Thinking Block
The thinking block operates as a protected internal workspace where the model can reason freely without the constraints of producing polished output. Within this block, several cognitive processes take place. The model begins by decomposing the problem into manageable sub-tasks. It then generates and evaluates multiple hypotheses for each sub-task, testing their validity against known constraints. Throughout this process, the model performs self-verification — checking intermediate results for consistency and correctness. Finally, it integrates the findings from all sub-tasks into a unified conclusion that forms the basis of the final answer.
It is important to understand that the content within the thinking block is raw, unfiltered reasoning. It may contain false starts, abandoned hypotheses, mathematical scratching, and self-corrections. This raw quality is actually a feature, not a bug — it provides genuine insight into the model’s reasoning process and helps users understand not just what the model concluded, but how and why it reached that conclusion. This visibility is invaluable for debugging, verification, and building trust in AI-generated outputs.
Adaptive Thinking in Claude Opus 4.6 and Sonnet 4.6
The latest evolution of Extended Thinking is Adaptive Thinking, introduced in Claude Opus 4.6 and Sonnet 4.6. This enhancement represents a significant refinement in how the model allocates its reasoning resources. Rather than using a fixed amount of thinking for every query, Adaptive Thinking dynamically assesses the complexity of each input and adjusts the depth of reasoning accordingly. A simple factual question might trigger only a brief moment of thinking, while a complex multi-step mathematical proof would receive extensive reasoning time.
Adaptive Thinking also introduces interleaved thinking, which is automatically enabled. With interleaved thinking, the model does not confine all its reasoning to a single upfront thinking block. Instead, it can insert additional thinking blocks throughout its response as needed. For instance, if the model is writing a long analysis and encounters a particularly challenging sub-point, it can pause, engage in focused thinking about that specific issue, and then resume its response with renewed clarity. This makes the reasoning process more natural and efficient.
Another notable change in Opus 4.6 and Sonnet 4.6 is that thinking content is now summarized by default. Previously, users would see the complete, raw thinking output, which could be extremely lengthy for complex problems. The default summarization provides a concise overview of the reasoning process while still maintaining the essential logical steps. Users who need the full thinking output can adjust their settings accordingly. This is an important consideration for production applications.
How to Use Extended Thinking: Practical Examples
Extended Thinking is accessible through Anthropic’s API using specific parameters. The key parameter is budget_tokens, which sets the maximum number of tokens the model can use for thinking. Below are practical code examples demonstrating how to implement Extended Thinking in your applications.
Basic Python Implementation
import anthropic
client = anthropic.Anthropic()
# Create a request with Extended Thinking enabled
response = client.messages.create(
model="claude-sonnet-4-6-20250415",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Allow up to 10,000 tokens for thinking
},
messages=[
{
"role": "user",
"content": "Find all prime factors of 2047 and explain your method."
}
]
)
# Process the response - thinking and text blocks
for block in response.content:
if block.type == "thinking":
print("[Thinking Process]")
print(block.thinking)
elif block.type == "text":
print("[Final Answer]")
print(block.text)
In this example, we create a message request with the thinking parameter set to enabled. The budget_tokens value of 10,000 allows the model to use up to 10,000 tokens for its internal reasoning. The response contains multiple content blocks — thinking blocks that reveal the reasoning process and text blocks that contain the final polished answer. You should structure your application to handle both block types appropriately.
TypeScript/Node.js Implementation
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function analyzeWithThinking(prompt: string, budgetTokens: number = 8000) {
const response = await client.messages.create({
model: "claude-sonnet-4-6-20250415",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: budgetTokens
},
messages: [
{ role: "user", content: prompt }
]
});
const result = { thinking: "", answer: "" };
for (const block of response.content) {
if (block.type === "thinking") {
result.thinking += block.thinking;
} else if (block.type === "text") {
result.answer += block.text;
}
}
return result;
}
// Usage example: complex code analysis
const analysis = await analyzeWithThinking(
"Review this algorithm for potential edge cases and optimize it for performance.",
12000
);
console.log("Reasoning:", analysis.thinking);
console.log("Analysis:", analysis.answer);
Budget Tokens Configuration Guide
Choosing the right budget_tokens value is critical for balancing quality, cost, and latency. The following table provides recommended ranges based on task complexity. Keep in mind that these are guidelines — the optimal value depends on your specific use case and requirements.
| Task Type | Recommended budget_tokens | Description | Typical Latency Impact |
|---|---|---|---|
| Simple Q&A / Summarization | 1,024 – 4,000 | Brief thinking sufficient for straightforward tasks | Low (+1-3 seconds) |
| Moderate Code Generation | 4,000 – 10,000 | Multi-step reasoning for code with moderate complexity | Medium (+3-8 seconds) |
| Complex Math / Analysis | 10,000 – 32,000 | Deep reasoning for mathematical proofs and data analysis | High (+8-20 seconds) |
| Advanced Research / Argumentation | 32,000+ | Maximum thinking for the most challenging problems | Very High (+20-60 seconds) |
Streaming Extended Thinking Responses
For production applications, streaming is highly recommended when using Extended Thinking. Since the model may spend significant time in the thinking phase, streaming allows you to provide real-time feedback to users. Here is how to implement streaming with Extended Thinking in Python.
import anthropic
client = anthropic.Anthropic()
# Stream a response with Extended Thinking
with client.messages.stream(
model="claude-sonnet-4-6-20250415",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 8000
},
messages=[
{"role": "user", "content": "Derive the closed-form solution for the Fibonacci sequence."}
]
) as stream:
current_type = None
for event in stream:
if hasattr(event, 'type'):
if event.type == 'content_block_start':
block_type = event.content_block.type
if block_type == 'thinking':
print("
--- Thinking ---")
current_type = 'thinking'
elif block_type == 'text':
print("
--- Answer ---")
current_type = 'text'
elif event.type == 'content_block_delta':
if hasattr(event.delta, 'thinking'):
print(event.delta.thinking, end='', flush=True)
elif hasattr(event.delta, 'text'):
print(event.delta.text, end='', flush=True)
Advantages and Disadvantages of Extended Thinking
Advantages
- Dramatically Improved Accuracy: Extended Thinking significantly boosts performance on tasks requiring deep reasoning. Mathematical problem-solving, complex coding challenges, and multi-step logical analysis all see substantial improvements. Research shows accuracy on math benchmarks improves logarithmically with thinking tokens, meaning even modest token budgets can yield meaningful gains. This is one of the most important benefits to understand.
- Full Transparency and Auditability: The visible thinking block provides unprecedented insight into the model’s reasoning process. Developers can examine exactly how the model approached a problem, which hypotheses it considered, and why it arrived at its conclusion. This transparency is invaluable for debugging, quality assurance, and regulatory compliance in enterprise applications.
- Self-Correction During Reasoning: Unlike standard responses where errors propagate unchecked, Extended Thinking allows the model to recognize and correct mistakes during the reasoning process. If a calculation yields an unexpected result, the model can backtrack and try an alternative approach — much like a human expert would.
- Enhanced Performance on Complex Tasks: Tasks that require maintaining context across multiple reasoning steps — such as proving mathematical theorems, analyzing legal documents, or designing system architectures — benefit enormously from the dedicated thinking space that Extended Thinking provides.
- Controllable Resource Allocation: The
budget_tokensparameter gives developers precise control over how much computational effort goes into thinking. This allows optimization of the cost-quality-latency tradeoff for each specific use case.
Disadvantages
- Increased Response Latency: The thinking phase adds time before the final answer begins generating. For time-sensitive applications such as real-time chat or interactive tools, this additional latency can negatively impact user experience. You should carefully consider whether the accuracy gains justify the latency cost for your specific application.
- Higher Token Costs: Tokens consumed during the thinking phase count toward API billing. A request with 10,000 thinking tokens costs significantly more than the same request without Extended Thinking. Organizations need to budget accordingly and optimize
budget_tokenssettings to avoid unnecessary expenses. - Overthinking Simple Tasks: When Extended Thinking is enabled with a large token budget for simple questions, the model may engage in unnecessary elaboration. This wastes both time and money. Adaptive Thinking in newer models mitigates this, but developers using older models or fixed budgets should be aware of this risk.
- Limited Control Over Thinking Direction: While you can control how much the model thinks via
budget_tokens, you cannot directly steer the direction of its thinking. The model autonomously decides what reasoning strategies to pursue, which may not always align with the user’s preferred approach. - Thinking Content May Contain Errors: The raw thinking process may include incorrect intermediate steps, abandoned hypotheses, and reasoning tangents. Users who examine the thinking block should understand that not all content within it represents the model’s final position — it is work-in-progress reasoning that includes trial and error.
Differences Between Extended Thinking and Chain of Thought
Extended Thinking is frequently compared to Chain of Thought (CoT) prompting, and understanding the distinction is important for choosing the right approach for your needs. While both techniques encourage step-by-step reasoning, they operate at fundamentally different levels of the AI system.
| Comparison Aspect | Extended Thinking | Chain of Thought (CoT) Prompting |
|---|---|---|
| Implementation Level | Built into the model architecture | Prompt engineering technique |
| Activation Method | API parameter (thinking.type: "enabled") |
Prompt instructions (e.g., “Think step by step”) |
| Reasoning Location | Dedicated thinking block (separate from output) | Within the main output text |
| Token Budget Control | Precise control via budget_tokens |
No direct control mechanism |
| Reasoning Quality | Model-optimized reasoning strategies | Dependent on prompt quality and phrasing |
| Cost Structure | Thinking tokens billed separately | Billed as output tokens |
| Model Compatibility | Claude 3.7 Sonnet and later | Works with most LLMs |
| Output Cleanliness | Final answer is clean (thinking is separate) | Reasoning steps mixed into the output |
| Self-Correction | Model can backtrack and correct within thinking | Limited self-correction ability |
| Transparency | Raw thinking process fully visible | Reasoning shown as formatted output |
The fundamental difference is that Extended Thinking is an architectural feature of the model itself, while Chain of Thought is a prompting strategy that works with the model’s standard capabilities. Extended Thinking provides a dedicated reasoning space that is separate from the final output, which means the final answer remains clean and well-structured. With CoT prompting, the reasoning steps are embedded in the output itself, which can make the response longer and harder to parse programmatically. For applications where both reasoning quality and output cleanliness matter, Extended Thinking is the superior approach — but it is important to note that CoT prompting remains valuable for models that do not support Extended Thinking.
Common Misconceptions
Misconception 1: Extended Thinking Makes the Model Omniscient
A common mistake is assuming that Extended Thinking can solve any problem correctly given enough thinking tokens. In reality, Extended Thinking enhances the model’s ability to reason about information it already has — it does not add new knowledge. If the model’s training data does not include specific information, no amount of thinking will produce accurate answers about that topic. Extended Thinking improves reasoning, not recall. This is a critical distinction that you should keep in mind when designing your applications.
Misconception 2: More Budget Tokens Always Means Better Results
While increasing budget_tokens generally improves accuracy, the relationship is logarithmic, not linear. The most dramatic improvements occur in the first few thousand tokens of thinking. Beyond a certain point, additional tokens yield diminishing returns while continuing to increase costs and latency. For most practical tasks, a well-calibrated token budget of 4,000-10,000 tokens provides an excellent balance of quality and efficiency. Blindly maximizing the budget is wasteful and may even introduce unnecessary overthinking. Note that this is an important optimization consideration for production systems.
Misconception 3: The Thinking Block Content Is Always Correct
The thinking block shows the model’s raw reasoning process, which includes trial and error. It may contain incorrect hypotheses that are later abandoned, calculations that are redone, and reasoning paths that lead to dead ends before the model finds the correct approach. This is actually by design — the thinking block is a scratchpad, not a polished document. Users should evaluate the final answer as the model’s definitive output and treat thinking block content as supplementary insight into the reasoning process. This is an important point to remember when reviewing Extended Thinking outputs.
Misconception 4: Adaptive Thinking Is a Completely Different Feature
Some users believe that Adaptive Thinking (found in Claude Opus 4.6 and Sonnet 4.6) is an entirely separate capability from Extended Thinking. In fact, Adaptive Thinking is the natural evolution of Extended Thinking. It uses the same underlying architecture but adds intelligent resource allocation — automatically determining how much thinking is needed based on task complexity. Think of it as Extended Thinking with built-in optimization, not as a replacement or alternative feature.
Misconception 5: Extended Thinking Is Just Chain of Thought Under a Different Name
As discussed in the comparison section above, Extended Thinking and Chain of Thought prompting are fundamentally different approaches. CoT is a prompt engineering technique that works with any LLM, while Extended Thinking is an architectural feature built into specific Claude models. The thinking occurs in a dedicated space separate from the output, uses model-optimized reasoning strategies, and offers precise control through the budget_tokens parameter. Conflating the two leads to misunderstandings about capabilities and appropriate use cases.
Real-World Use Cases
Software Development and Code Review
Extended Thinking has proven especially valuable in software engineering contexts. When asked to review complex code, refactor large functions, or debug elusive issues, the model uses its thinking block to systematically trace through the code logic, identify potential edge cases, and evaluate different solution approaches before presenting its recommendations. For example, a development team might use Extended Thinking to analyze a microservices architecture for potential race conditions — the model would think through the timing of various service interactions, identify vulnerable sequences, and propose concrete fixes. In practical settings, this systematic analysis capability is what makes Extended Thinking particularly useful for development teams.
Data Analysis and Business Intelligence
When analyzing datasets for business insights, Extended Thinking helps the model cross-reference multiple data points, identify non-obvious patterns, and validate its conclusions before presenting them. This is particularly valuable when analyzing financial data, market trends, or customer behavior patterns where superficial analysis might miss important correlations. The thinking block allows the model to consider multiple analytical frameworks, test different hypotheses against the data, and present only the most well-supported conclusions.
Legal Document Analysis
Legal professionals have found Extended Thinking valuable for contract review and risk assessment. The model uses its thinking space to systematically evaluate each clause, cross-reference terms across sections, identify potential conflicts or ambiguities, and assess risk levels. The transparent thinking process is especially important in legal contexts, where understanding the reasoning behind an assessment is as important as the assessment itself. You should consider this application area if you work in compliance or legal technology.
Academic Research and Scientific Reasoning
Researchers use Extended Thinking for tasks like deriving mathematical proofs, analyzing experimental data, and evaluating competing theoretical frameworks. The step-by-step reasoning in the thinking block mirrors the methodical approach of scientific inquiry, making it easier for researchers to verify the model’s reasoning and identify any logical gaps. This application is particularly powerful for interdisciplinary problems where insights from multiple fields need to be synthesized.
Education and Tutoring
In educational contexts, the thinking block itself becomes a teaching tool. Students can observe how an expert-level reasoner approaches a problem — how it breaks down complex questions, considers multiple approaches, checks its work, and arrives at a solution. This process visibility transforms Extended Thinking from merely an answer generator into a demonstration of problem-solving methodology. Educators have found this particularly effective for mathematics, physics, and logic courses.
Frequently Asked Questions (FAQ)
Q. Is Extended Thinking available for free?
A. On Claude.ai, Anthropic’s official chat interface, Extended Thinking is available depending on your subscription plan. Through the API, tokens used in the thinking block are billed at standard token rates. The budget_tokens parameter allows you to control costs by setting an upper limit on thinking tokens. For production applications, it is important to monitor your token usage and adjust budgets based on actual needs.
Q. Which Claude models support Extended Thinking?
A. Extended Thinking was first introduced with Claude 3.7 Sonnet in early 2025. It is also available in Claude Opus 4.6 and Sonnet 4.6, where it has evolved into Adaptive Thinking with automatic depth adjustment and interleaved thinking capabilities. Always check Anthropic’s official documentation for the most current model compatibility information, as new models are released regularly.
Q. Can I access the thinking block content programmatically?
A. Yes. When Extended Thinking is enabled via the API, the response includes content blocks of type “thinking” alongside the standard “text” blocks. You can parse these programmatically to log, analyze, or display the reasoning process. Note that on Opus 4.6 and Sonnet 4.6, thinking is summarized by default. If you need the complete, unsummarized thinking output, consult the API documentation for configuration options.
Q. Does Extended Thinking reduce hallucinations?
A. Extended Thinking reduces hallucinations that stem from logical errors and incomplete reasoning, because the step-by-step process with self-verification catches many such mistakes. However, hallucinations caused by factual inaccuracies in the model’s training data are not addressed by Extended Thinking alone. For factual accuracy, it is important to combine Extended Thinking with retrieval-augmented generation (RAG) or external verification systems. Keep in mind that Extended Thinking improves reasoning quality but does not guarantee factual correctness.
Q. What happens if I do not set budget_tokens?
A. On Adaptive Thinking-enabled models (Opus 4.6 and Sonnet 4.6), the model will automatically determine an appropriate amount of thinking based on the query’s complexity if you do not set budget_tokens explicitly. However, for cost management and predictable billing in production environments, it is strongly recommended to set a budget_tokens limit. On older models like Claude 3.7 Sonnet, the budget_tokens parameter is required when Extended Thinking is enabled.
Summary
Extended Thinking represents a fundamental advancement in how AI models handle complex reasoning tasks. By providing Claude with a dedicated thinking block for step-by-step reasoning, it dramatically improves accuracy on mathematical problems, coding challenges, and multi-step analytical tasks. The visible thinking process builds trust and enables verification, addressing one of the key concerns in enterprise AI adoption.
The key takeaways from this article are as follows. Extended Thinking uses serial test-time compute to perform sequential reasoning steps in a dedicated thinking block. Accuracy improves logarithmically with thinking tokens, making even modest budgets effective. The budget_tokens API parameter provides precise control over the cost-quality-latency tradeoff. Adaptive Thinking in Claude Opus 4.6 and Sonnet 4.6 automatically optimizes thinking depth, making the feature more efficient and accessible. Extended Thinking is fundamentally different from Chain of Thought prompting — it is an architectural feature, not a prompt engineering technique.
For practitioners looking to integrate Extended Thinking into their workflows, the recommendations are clear. Start with moderate token budgets (4,000-10,000) and adjust based on observed results. Use streaming for production applications to mitigate latency concerns. Leverage the thinking block for debugging and quality assurance. And stay informed about Anthropic’s latest model releases, as the Extended Thinking capability continues to evolve with each new generation of Claude models. The future of AI reasoning is not just about faster models — it is about models that think more carefully and transparently before they respond.


































Leave a Reply