← all repositories
deepseek-ai/DeepSeek-R1

Open-source o1 rival trained purely with reinforcement learning

DeepSeek-R1 proves you can teach LLMs to reason without spoon-feeding them curated examples first.

92k stars Language Models
DeepSeek-R1
Velocity · 7d
+183
★ / day
Trend
steady
star history

What it does DeepSeek-R1 is a 671-billion-parameter mixture-of-experts model that matches OpenAI’s o1 on math, code, and reasoning benchmarks. It comes in two flavors: R1-Zero, trained with reinforcement learning alone, and the full R1, which adds a small “cold-start” data phase to fix R1-Zero’s tendency to ramble, repeat itself, and mix languages mid-thought. The project also releases six smaller distilled variants (1.5B to 70B) built on Llama and Qwen.

The interesting bit R1-Zero is the first open result showing that reasoning capabilities—self-verification, reflection, long chain-of-thought—can emerge purely from RL incentives without supervised fine-tuning as scaffolding. The distillation story is equally notable: a 32B parameter dense model beats o1-mini on several benchmarks, suggesting the big model’s reasoning patterns transfer better than trying to train small models from scratch with RL.

Key highlights

  • 671B total params, 37B active per forward pass, 128K context window
  • Benchmark table shows R1 topping or matching o1-1217 on MATH-500 (97.3%), AIME 2024 (79.8%), and LiveCodeBench (65.9%)
  • Distilled Qwen-32B outperforms o1-mini on AIME 2024 (72.6% vs 63.6%)
  • MIT licensed; weights and paper available on HuggingFace
  • OpenAI-compatible API and web chat at deepseek.com

Caveats

  • Running the full R1 locally requires the DeepSeek-V3 infrastructure; the README warns that standard HuggingFace Transformers support is incomplete
  • Distilled models use modified configs and tokenizers—you can’t just drop them into existing Llama/Qwen pipelines blindly
  • R1-Zero’s raw output is described as poorly readable and prone to endless repetition; the “interesting” version is R1 with its cold-start fix

Verdict Worth your time if you’re researching RL-driven reasoning or need a locally runnable strong reasoner via the distilled variants. Skip if you’re looking for a plug-and-play drop-in replacement for your existing Llama tooling without reading the setup notes.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.