← all repositories
rasbt/reasoning-from-scratch

Hand-cranking DeepSeek R1 from a Qwen3 base

A step-by-step PyTorch walkthrough for turning a pretrained LLM into a reasoning model, no black boxes allowed.

4.5k stars Jupyter Notebook Language ModelsML FrameworksLearning
reasoning-from-scratch
Velocity · 7d
+9.7
★ / day
Trend
steady
star history

What it does This is the official code repo for Sebastian Raschka’s Build a Reasoning Model (From Scratch). It starts with a pretrained Qwen3 base model and layers on reasoning capabilities—chain-of-thought prompting, inference-time scaling, self-refinement, GRPO-based reinforcement learning, and distillation—using plain PyTorch in Jupyter notebooks. The goal is educational: you see the gears turn instead of calling an API and hoping.

The interesting bit The repo mirrors the techniques used in production models like DeepSeek R1 and GPT-5 Thinking, but strips them down to consumer-hardware scale. Chapters 2–4 run fine on CPU; chapters 5–6 want a GPU. There’s even a mental-model diagram that maps how the pieces fit together, which is rarer than it should be in ML education.

Key highlights

  • Eight main chapters plus six appendices, each with exercise solutions
  • Covers inference-time scaling (CoT, self-consistency, Best-of-N), GRPO reinforcement learning, and distillation
  • Bonus scripts for MATH-500 evaluation, batched GRPO, and Hugging Face checkpoint loading
  • Automated tests across Linux, macOS, and Windows
  • Includes a chat interface appendix if you want to talk to your creation

Caveats

  • This is a companion to a print book; Raschka explicitly won’t accept contributions that alter the main chapter code, so don’t expect community-driven evolution
  • “From scratch” here means “from a pretrained base”; if you want to build the transformer itself, that’s a different Raschka book

Verdict Grab this if you’re an ML engineer who understands transformers but finds reasoning papers opaque and wants to touch the code. Skip it if you need a production-ready framework or already run your own RLHF cluster.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.