← all repositories
karpathy/nanoGPT

Andrej Karpathy's nanoGPT is now a museum piece

A deliberately minimal GPT-2 implementation that taught a generation how transformers work, now officially succeeded by nanochat.

59.3k stars Python Language ModelsML Frameworks
nanoGPT
Velocity · 7d
+47
★ / day
Trend
steady
star history

What it does

nanoGPT is a stripped-down PyTorch implementation for training and fine-tuning GPT-2-scale language models. The entire codebase fits in two ~300-line files: train.py for the training loop and model.py for the transformer itself. It can reproduce GPT-2 (124M parameters) on OpenWebText in about four days on an 8×A100 node, or train a toy Shakespeare model on your laptop in three minutes.

The interesting bit

The README opens with a deprecation notice: Karpathy now points visitors to nanochat, leaving this repo up “for posterity.” That honesty is refreshing in a field where old repos usually just rot silently. When it was current, the project’s real trick was refusing to be clever—no abstractions, no framework, just raw PyTorch you could actually read and mutate.

Key highlights

  • Loads OpenAI’s GPT-2 weights directly for fine-tuning or initialization
  • Supports distributed training across multiple GPU nodes via torchrun
  • Includes pre-built configs for CPU, single GPU, and multi-node A100 setups
  • sample.py handles inference from trained checkpoints or OpenAI’s released models
  • Apple Silicon supported via --device=mps for 2–3× speedup over CPU

Caveats

  • Explicitly deprecated as of November 2025; new work should use nanochat instead
  • Multi-node training without Infiniband “will most likely crawl,” per the README
  • Character-level Shakespeare demo is fun but produces “lol ¯\(ツ)/¯” quality output

Verdict

Worth studying if you want to understand how a modern transformer trainer is structured without drowning in framework indirection. Skip it if you’re building something new—follow the author’s own advice and use nanochat.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.