← all repositories
raiyanyahya/how-to-train-your-gpt

A 7,500-line interactive textbook that builds an LLM from zero

Every line of a modern GPT, annotated like you're five, engineered like you're not.

2.2k stars Jupyter Notebook Language ModelsLearning
how-to-train-your-gpt
Velocity · 7d
+63
★ / day
Trend
steady
star history

What it does This repo is a 12-chapter walkthrough that has you write a complete decoder-only Transformer from scratch: tokenizer, embeddings, RoPE, multi-head attention, SwiGLU blocks, training loop, inference engine, the lot. It targets the LLaMA 3-style architecture (RMSNorm, pre-norm, weight tying, KV cache) and ships with runnable Jupyter notebooks plus 22+ standalone topic explainers.

The interesting bit The author built this specifically to understand attention themselves, and the pedagogical honesty shows. Explanations use “child language” and party analogies, but the code implements real techniques like cosine warmup, mixed precision, and gradient clipping. Two narrative walkthroughs literally trace one sentence through the entire model step by step.

Key highlights

  • ~860 lines of core model code, ~2,600 lines of explanation and diagrams
  • Every single line commented with what it does and why it’s there
  • 22 topic explainers covering RoPE, SwiGLU, Flash Attention, grouped query attention, etc.
  • Runs on CPU in minutes with a 17M-parameter “tiny” config; 151M GPT-2-scale config available
  • Zero prerequisites beyond Python basics; teaches linear algebra and PyTorch as it goes

Caveats

  • The explanations and examples WIP/ directory name suggests some explainers are still being polished
  • Default training uses a “very small dataset” by design; this is explicitly for learning, not producing a useful model
  • GPU setup requires manual config editing in main.py

Verdict Ideal for developers who’ve used ChatGPT but never understood why 1/√d_k scaling matters. Skip it if you already know how to hand-derive backprop through a SwiGLU layer and just want pretrained weights.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.