Hand-cranking DeepSeek R1 from a Qwen3 base
A step-by-step PyTorch walkthrough for turning a pretrained LLM into a reasoning model, no black boxes allowed.

What it does This is the official code repo for Sebastian Raschka’s Build a Reasoning Model (From Scratch). It starts with a pretrained Qwen3 base model and layers on reasoning capabilities—chain-of-thought prompting, inference-time scaling, self-refinement, GRPO-based reinforcement learning, and distillation—using plain PyTorch in Jupyter notebooks. The goal is educational: you see the gears turn instead of calling an API and hoping.
The interesting bit The repo mirrors the techniques used in production models like DeepSeek R1 and GPT-5 Thinking, but strips them down to consumer-hardware scale. Chapters 2–4 run fine on CPU; chapters 5–6 want a GPU. There’s even a mental-model diagram that maps how the pieces fit together, which is rarer than it should be in ML education.
Key highlights
- Eight main chapters plus six appendices, each with exercise solutions
- Covers inference-time scaling (CoT, self-consistency, Best-of-N), GRPO reinforcement learning, and distillation
- Bonus scripts for MATH-500 evaluation, batched GRPO, and Hugging Face checkpoint loading
- Automated tests across Linux, macOS, and Windows
- Includes a chat interface appendix if you want to talk to your creation
Caveats
- This is a companion to a print book; Raschka explicitly won’t accept contributions that alter the main chapter code, so don’t expect community-driven evolution
- “From scratch” here means “from a pretrained base”; if you want to build the transformer itself, that’s a different Raschka book
Verdict Grab this if you’re an ML engineer who understands transformers but finds reasoning papers opaque and wants to touch the code. Skip it if you need a production-ready framework or already run your own RLHF cluster.