GPT in 300 lines: the educational autopsy
Andrej Karpathy stripped OpenAI's GPT down to readable PyTorch to prove the transformer isn't magic—just clever batching.

What it does minGPT is a minimal PyTorch re-implementation of GPT training and inference. The core model lives in roughly 300 lines of code across three files: a Transformer definition, a Byte Pair Encoder, and generic PyTorch training boilerplate. It can load pretrained GPT-2 weights, train from scratch on toy problems like addition or character-level text, or generate text from prompts.
The interesting bit The README’s blunt honesty is the real feature. Karpathy explicitly calls this “semi-archived” and points to nanoGPT as the successor, explaining that minGPT’s popularity in courses and books froze it in place. The references section is even better—he annotates paper claims with his own detective work, noting where OpenAI’s released code doesn’t match their papers (“weird because in their released code i can only find a simple use of the old 0.02…”).
Key highlights
- ~300-line core model (
mingpt/model.py) versus sprawling production frameworks - Loads actual GPT-2 pretrained weights (124M parameter version shown in examples)
- Demos include training a transformer to add numbers from scratch, character-level language modeling, and text generation
- Extensive annotated paper references for GPT-1, GPT-2, GPT-3, and Image GPT with implementation notes and discrepancies
- MIT licensed
Caveats
- Semi-archived as of January 2023; author directs active use to nanoGPT
- Unit test coverage described as “not super amazing just yet”
- No
requirements.txt(author’s own todo list admits this) - Missing modern training features: no mixed precision, no distributed training, “print statement amateur hour” logging
Verdict Grab this if you’re teaching, learning, or need to understand transformers without fighting through 90% unused code paths in full frameworks. Skip it if you need production training at scale—nanoGPT or HuggingFace already won that race.