← all repositories
angelos-p/llm-from-scratch

Build a GPT that fits in a lunch break

A stripped-down workshop that trades GPT-2 scale for the clarity of writing every transformer component yourself.

llm-from-scratch
Velocity · 7d
+47
★ / day
Trend
steady
star history

What it does This repo is a six-part guided workshop where you write a complete GPT training pipeline from an empty file. You build a character-level tokenizer, transformer blocks, training loop, and text generator, ending with a ~10M parameter model that trains on a laptop in under an hour. The output: Shakespeare-ish prose and, hopefully, actual understanding of why each piece exists.

The interesting bit The author explicitly positions this as nanoGPT’s smaller, more pedagogical sibling. Where Karpathy’s version chases GPT-2 fidelity (124M params, hours of training), this scales down to 10M params and character-level tokenization so the entire arc—tokenizer to loss curves—fits in a single workshop session. The docs even explain why BPE is wrong for small data, which is the kind of practical constraint most tutorials skip.

Key highlights

  • Six sequential docs covering tokenization through a final “competition” to train the best AI poet
  • Three model sizes (0.5M to 10M params) with stated training times on Apple Silicon
  • Auto-detects MPS, CUDA, or CPU; includes Colab instructions
  • Character-level tokenization (vocab=65) chosen deliberately for small datasets, with BPE migration covered later
  • Requires only Python literacy, no ML background claimed

Caveats

  • The 2026 date on Karpathy’s “microgpt” reference link looks like a typo in the README
  • No code is shown in the README itself; you must follow the docs in order
  • Benchmarks are limited to the author’s M3 Pro; your mileage will vary

Verdict Ideal if you’ve read about transformers but never hand-wrote attention or watched a loss curve descend in real time. Skip if you want production-scale training or a drop-in model weights file.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.