Andrej Karpathy drops PyTorch, trains GPT-2 in 1,000 lines of C
A from-scratch LLM trainer that ditches 245MB of PyTorch dependencies for raw C/CUDA, and somehow runs slightly faster.

What it does
llm.c trains GPT-2 (and eventually GPT-3) models using nothing but C and CUDA—no PyTorch, no Python runtime. The core CPU reference implementation fits in a single ~1,000-line file, train_gpt2.c, while the production path lives in train_gpt2.cu. A parallel PyTorch implementation in train_gpt2.py exists strictly for verification and comparison.
The interesting bit
The project treats “educational” and “fast” as non-conflicting goals. The dev/cuda directory collects hand-written, documented kernels ranging from naive to optimized, while the mainline freely swaps in vendor libraries (cuBLAS, cuDNN, NCCL) when raw speed matters. It’s a living benchmark: your custom kernel is measured against the expert upper bound, not against vague intuition.
Key highlights
- Currently ~7% faster than PyTorch Nightly on the mainline CUDA path
- Single-file CPU reference (
train_gpt2.c) for actually understanding the algorithm - Multi-GPU and multi-node training via MPI/NCCL, with three different initialization strategies for stubborn cluster environments
- Unit tests that verify C and CUDA outputs match PyTorch exactly (
overall okay: 1) - Flash Attention via cuDNN available, though it balloons compile time from seconds to ~a minute
Caveats
- The CPU path is explicitly a “you won’t get far” demo; training on Apple Silicon M3 Max takes ~1.3 seconds per step for a tiny 124M model
- cuDNN integration is new enough that it’s disabled by default
- The README notes the tension between simplicity and speed: a 2% performance gain that costs 500 lines of complexity may be rejected
Verdict
Worth your time if you’re learning CUDA, skeptical of framework bloat, or want to see how close to the metal LLM training can get. Skip it if you need production features like checkpoint resumption, mixed-precision convenience, or anything resembling HuggingFace integration.