← all repositories
huggingface/open-r1

HuggingFace's open recipe for cloning DeepSeek-R1

A community effort to reverse-engineer and openly reproduce the training pipeline behind DeepSeek's famous reasoning model.

26k stars Python Language ModelsML Frameworks
open-r1
Velocity · 7d
+52
★ / day
Trend
steady
star history

What it does Open R1 is a work-in-progress toolkit that aims to rebuild the entire DeepSeek-R1 pipeline in the open. It provides training scripts for supervised fine-tuning and GRPO reinforcement learning, plus data generation tools that use Distilabel to distill reasoning traces from DeepSeek-R1 itself. A Makefile ties the steps together so you can run the pipeline without memorizing long shell commands.

The interesting bit The project is deliberately simple—just three core scripts and a Makefile—because the real work is in the data and the recipes. They have already completed “Step 1” by releasing Mixture-of-Thoughts, a 350k-sample verified reasoning dataset, and training a 7B model that matches DeepSeek’s distilled version on math and coding benchmarks.

Key highlights

  • Releases curated datasets: Mixture-of-Thoughts (350k traces), CodeForces-CoTs (10k problems, 100k solutions), and OpenR1-Math-220k
  • OpenR1-Distill-7B scores 52.7 on AIME 2024 versus DeepSeek’s 51.3, and 89.0 on MATH-500 versus 93.5
  • Supports SFT and GRPO training via Accelerate + DeepSpeed ZeRO-2/3, with vLLM backend for scalable generation
  • Single-node and multi-node Slurm recipes provided, including colocated vLLM mode for smaller models
  • Data generation recipes to distill from either small models or the full DeepSeek-R1

Caveats

  • Requires CUDA 12.4 and PyTorch 2.6.0; version mismatches cause segmentation faults
  • Training configs target 8× H100 (80GB) nodes; you’ll need to retune batch sizes for other hardware
  • Chat template and EOS token handling is finicky and varies by base model (Qwen, Llama, etc.)
  • Steps 2 and 3 (pure RL pipeline, full multi-stage training) are still incomplete

Verdict Worth a look if you’re researching reasoning models or need a reproducible baseline for distillation. Skip it if you want a polished, end-to-end product—this is explicitly a construction site, not a finished building.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.