What Is DeepSeek R1?
DeepSeek R1 is an open-weight, reasoning-focused large language model released by the Chinese AI company DeepSeek (深度求索) in January 2025. It achieves scores comparable to OpenAI’s o1 on math, coding, and logic benchmarks — but its weights are published under the MIT License, making commercial use, modification, and redistribution fully permitted.
Think of DeepSeek R1 as the model that shattered the assumption that high-end reasoning AI is only available through closed, expensive APIs. DeepSeek’s API is priced at roughly 1/20th to 1/30th of OpenAI o1, and because the model itself can be downloaded and run locally or on any cloud, researchers, startups, and individual developers finally have a serious reasoning model they can actually use freely. Its launch in early 2025 also triggered what became known as the “DeepSeek shock” — a single-day sell-off that sent NVIDIA shares down roughly 17% on AI capex concerns.
How to Pronounce DeepSeek R1
deep-seek r-one (/diːp siːk ɑːr wʌn/)
DeepSeek R1
How DeepSeek R1 Works
R1 is built on top of DeepSeek’s foundation model DeepSeek-V3, a 671-billion-parameter Mixture-of-Experts (MoE) architecture with 37 billion active parameters per token. What sets R1 apart is how its reasoning behavior was learned: through reinforcement learning alone, without a preceding supervised fine-tuning stage. The accompanying research paper documents an earlier sibling model, R1-Zero, that discovered chain-of-thought reasoning purely through RL rewards — an unusually stripped-down training recipe for modern LLMs.
The R1 model family
| Model | Characteristics | Intended use |
|---|---|---|
DeepSeek-R1-Zero |
Pure RL training, no SFT | Research — showing reasoning can emerge from RL alone |
DeepSeek-R1 |
R1-Zero plus alignment polish | Production reasoning work |
| R1-Distill series | Qwen / Llama distilled with R1’s reasoning (1.5B–70B) | On-device, low-resource |
Reasoning trace
R1 emits its internal deliberation inside <think>...</think> tags before producing the final answer. Consumer UIs typically hide the thinking, but the API exposes it on a dedicated field. Keep in mind that the thinking is what makes R1 accurate — harder problems demand longer traces, so cutting off thinking to save tokens usually tanks quality.
R1 inference pipeline
received
<think> tags
verify answer
response
DeepSeek R1 Usage and Examples
Calling the official API (OpenAI-compatible)
DeepSeek’s API follows an OpenAI-compatible schema, so dropping in a new base_url into an existing OpenAI SDK call is typically enough to start using R1. Note that the reasoning trace is returned on a separate reasoning_content field.
from openai import OpenAI
client = OpenAI(
api_key="YOUR_DEEPSEEK_API_KEY",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-reasoner", # identifier for R1
messages=[
{"role": "user",
"content": "If a circle has area 100, what's the radius? Four sig figs."}
]
)
# Reasoning trace
print("[Thought]", response.choices[0].message.reasoning_content)
# Final answer
print("[Answer]", response.choices[0].message.content)
Running a distilled variant locally with Ollama
Distilled versions (for example, DeepSeek-R1-Distill-Qwen-7B) are small enough to run on a single consumer GPU with 16 GB of VRAM. Ollama makes this a two-command experience.
# Pull the distilled model
ollama pull deepseek-r1:7b
# Start an interactive session
ollama run deepseek-r1:7b
>>> In how many ways can 25 students be split into five teams of five?
Self-hosting at scale
For production serving, teams typically reach for vLLM or TGI with the MoE R1 weights. Running full R1 requires hundreds of gigabytes of VRAM, so most deployments use multi-GPU H100 nodes or managed hosts like Together AI or Fireworks AI that have already quantized and load-balanced the model.
Advantages and Disadvantages of DeepSeek R1
Advantages
✅ Open weights (MIT)
Commercial use, fine-tuning, and redistribution are all fair game.
✅ Very low price
Official API runs roughly 20–30× cheaper than o1.
✅ Top-tier reasoning
AIME, MATH, and LiveCodeBench scores sit near o1.
✅ Scalable family
Distill variants from 1.5B to 70B fit everything from laptops to servers.
Disadvantages
⚠️ Content filtering
The official API enforces Chinese regulatory constraints on sensitive topics.
⚠️ Data residency
API traffic goes to Chinese servers — a concern for regulated workloads.
⚠️ Long chains of thought
Thinking burns tokens and adds latency compared to non-reasoning models.
⚠️ Text-only
R1 doesn’t handle images or audio — use DeepSeek-V3 or V3-VL for those.
DeepSeek R1 vs OpenAI o1
R1 and o1 both target the “reasoning model” niche, but they differ across several important dimensions. In practice, teams choose between them based on three questions: How sensitive is the data? How price-sensitive is the workload? Do you need open weights?
| Aspect | DeepSeek R1 | OpenAI o1 |
|---|---|---|
| License | MIT (open weights) | Closed (API only) |
| API pricing | Single-digit dollars / M tokens | $15 / M input, $60 / M output |
| Reasoning exposure | Raw thoughts exposed via API | Only summarized trace |
| Data residency | China (bypassable by self-host) | US (Enterprise options) |
| Multimodal | Text only | Vision support |
Common Misconceptions
Misconception 1: “R1 is fully open-source”
Note that R1 is open-weight, not fully open-source. Weights, tokenizer, and a technical report are published, but the full training data and training code are not. This is similar to Llama: you can deploy and fine-tune, but you can’t reproduce training from scratch.
Misconception 2: “Self-hosting eliminates all content filtering”
It’s important to note that even the base weights carry biases learned during alignment. A prompt about certain topics may still yield refusals or hedged answers. Mitigating this requires additional fine-tuning, not just a different host.
Misconception 3: “R1 is just Llama with extra training”
That confuses R1 with the R1-Distill-Llama variants. The flagship R1 is a Mixture-of-Experts model based on DeepSeek-V3 — a genuinely distinct architecture, not a Llama fine-tune.
Real-World Use Cases
Math and algorithmic problem solving
R1 tops many math reasoning benchmarks, including AIME 2024 where it posted 79.8%, slightly ahead of o1. Teams building AI tutors, auto-grading pipelines, and olympiad-style coaching tools adopted R1 quickly after launch.
Code generation and refactoring
On LiveCodeBench, R1 is competitive with o1 for debugging, test generation, and algorithm design — anywhere “thinking it through” matters. In practice you’ll see R1 wired into developer productivity tools as a low-cost alternative to premium closed APIs.
Agentic backbones
Because weights are open, R1 is increasingly chosen as the reasoning engine behind autonomous agents — RAG systems, tool-calling workflows, research assistants — particularly in cost-sensitive environments and regulated enterprises that need self-hosted inference.
Frequently Asked Questions (FAQ)
Q1: Does R1 support Japanese and other non-English languages?
Yes, R1 handles many languages including Japanese, though it isn’t tuned specifically for them. Expect strongest performance in English and Chinese; other languages may show more errors on culture-specific references.
Q2: What’s the difference between full R1 and the Distill variants?
Full R1 is a Mixture-of-Experts model weighing hundreds of gigabytes. Distill variants (based on Qwen or Llama, 1.5B–70B) are lighter models trained to imitate R1’s reasoning. Distill variants trade some accuracy for being runnable on commodity hardware.
Q3: What commercial considerations matter most?
The biggest issue is data residency — the official API ships traffic to China. For regulated workloads, self-hosting or using a Western provider (Together AI, Fireworks, Microsoft Azure AI Foundry) is typical.
Q4: What was the “DeepSeek shock”?
It refers to the market reaction in late January 2025 when R1’s release broke the narrative that only large US labs with vast compute budgets could build frontier reasoning models. NVIDIA stock fell roughly 17% in a single session on fears of reduced AI capex.
Q5: Any tips for using R1 in a RAG pipeline?
Remember that R1’s chain of thought is verbose, so token spend and latency rise quickly. Cap the maximum thinking budget, or swap in a Distill variant when you need snappier responses.
Conclusion
- DeepSeek R1 is an open-weight, reasoning-focused LLM released in January 2025.
- Published under the MIT License — commercial use, modification, and redistribution are all allowed.
- Trained primarily with reinforcement learning, using the DeepSeek-V3 MoE backbone.
- Matches OpenAI o1 on math and coding benchmarks while costing a fraction via the official API.
- Distilled variants from 1.5B to 70B make on-device and low-resource deployments viable.
- Watch out for Chinese data residency, content filtering, and long reasoning traces.
- One of the defining open models of 2025 — the one that triggered the “DeepSeek shock.”
References
📚 References
- ・DeepSeek. “DeepSeek-R1 Release Announcement.” https://api-docs.deepseek.com/news/news250120
- ・Hugging Face. “deepseek-ai/DeepSeek-R1 model card.” https://huggingface.co/deepseek-ai/DeepSeek-R1
- ・DeepSeek. “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.” https://arxiv.org/abs/2501.12948








































Leave a Reply