GPT-2 for the price of a nice dinner
Karpathy's minimal LLM training harness turns a $43K 2019 training run into a sub-$100 afternoon project.

What it does
nanochat is a stripped-down, single-GPU-node framework for training language models from scratch through the full lifecycle: tokenization, pretraining, finetuning, evaluation, inference, and a ChatGPT-style web UI. The pitch is simple enough to fit in a shell script: bash runs/speedrun.sh on an 8×H100 node, wait roughly two hours, and you’ve got a conversational model at GPT-2 capability.
The interesting bit
The entire hyperparameter zoo—width, heads, learning rate, training horizon, weight decay—collapses into one knob: --depth, the transformer layer count. Set depth and the framework auto-tunes everything else to stay compute-optimal. It’s a bet that scaling laws have matured enough to make hand-tuning obsolete, and the “GPT-2 speedrun” leaderboard (now down to 1.65 hours) treats training time as a competitive sport.
Key highlights
- Replicates GPT-2 (1.6B params, DCLM CORE score) for ~$48 on-demand, ~$15 spot instances; original 2019 training cost ~$43,000
- Explicit mixed-precision via a global
COMPUTE_DTYPEinstead of PyTorch’sautocast; weights stay fp32, forward passes cast on the fly - Single-file speedrun pipeline (
runs/speedrun.sh) plus research scripts for scaling-law sweeps and miniseries generation - Runs on single GPU (8× slower), A100s, CPU/MPS, though sub-80GB cards need batch-size tuning to avoid OOM
- Chat web UI included; model personality customizable through synthetic data injection in the SFT stage
Caveats
- CPU/MPS runs are “you will not get strong results” territory per the README; this is firmly GPU-first
- RL training doesn’t yet support fp16 GradScaler, unlike pretraining and SFT
- Non-CUDA paths (xpu, etc.) are “fairly vanilla PyTorch” but largely untested by the author
Verdict Ideal for researchers who want hackable, end-to-end LLM training without framework bloat, or anyone who finds pedagogical value in watching loss curves on their own hardware. If you just need an API key to call GPT-4, this is not your shortcut.