Build a GPT that fits in a lunch break
A stripped-down workshop that trades GPT-2 scale for the clarity of writing every transformer component yourself.

What it does This repo is a six-part guided workshop where you write a complete GPT training pipeline from an empty file. You build a character-level tokenizer, transformer blocks, training loop, and text generator, ending with a ~10M parameter model that trains on a laptop in under an hour. The output: Shakespeare-ish prose and, hopefully, actual understanding of why each piece exists.
The interesting bit The author explicitly positions this as nanoGPT’s smaller, more pedagogical sibling. Where Karpathy’s version chases GPT-2 fidelity (124M params, hours of training), this scales down to 10M params and character-level tokenization so the entire arc—tokenizer to loss curves—fits in a single workshop session. The docs even explain why BPE is wrong for small data, which is the kind of practical constraint most tutorials skip.
Key highlights
- Six sequential docs covering tokenization through a final “competition” to train the best AI poet
- Three model sizes (0.5M to 10M params) with stated training times on Apple Silicon
- Auto-detects MPS, CUDA, or CPU; includes Colab instructions
- Character-level tokenization (vocab=65) chosen deliberately for small datasets, with BPE migration covered later
- Requires only Python literacy, no ML background claimed
Caveats
- The 2026 date on Karpathy’s “microgpt” reference link looks like a typo in the README
- No code is shown in the README itself; you must follow the docs in order
- Benchmarks are limited to the author’s M3 Pro; your mileage will vary
Verdict Ideal if you’ve read about transformers but never hand-wrote attention or watched a loss curve descend in real time. Skip if you want production-scale training or a drop-in model weights file.