← all repositories
karpathy/nn-zero-to-hero

Karpathy builds a GPT from scratch, and you can watch every tensor

A YouTube-first course that hand-rolls backprop, BatchNorm, and transformers in Jupyter notebooks, no black boxes allowed.

23k stars Jupyter Notebook LearningML FrameworksLanguage Models
nn-zero-to-hero
Velocity · 7d
+17
★ / day
Trend
steady
star history

What it does This repo is the companion code for Andrej Karpathy’s video course on neural networks. Each lecture’s Jupyter notebook lives in the lectures/ directory, starting with a from-scratch autograd engine (micrograd) and ending with a working GPT-2-style transformer and its Byte Pair Encoding tokenizer. The videos assume you know Python and remember roughly what a derivative is; everything else is built together on screen.

The interesting bit The course deliberately avoids abstraction until you’ve earned it. Lecture 5 makes you backprop through a full MLP with BatchNorm manually—no loss.backward()—and the GPT lecture has GitHub Copilot (itself a GPT) helping write the GPT. It’s a neat pedagogical trick: the tool you’re learning to build becomes your pair programmer.

Key highlights

  • Eight lectures published, from bigram models to WaveNet-style CNNs to full transformers
  • Every notebook is the actual code written during the video, warts and all
  • Explicit exercises in video descriptions, with one (Lecture 5) provided as a Google Colab
  • Lecture 8 detours into tokenization, arguing it’s a “necessary evil” that explains many LLM weirdnesses
  • MIT licensed; ongoing and incomplete (residual connections and Adam optimizer are “notable todos”)

Caveats

  • The README itself admits this “may grow into something more respectable”—it’s a capture of video content, not a polished textbook
  • Some lectures depend on external repos (micrograd, makemore, minbpe) rather than self-contained notebooks
  • No automated tests or CI; correctness is verified by “does it match the video”

Verdict Worth your time if you want to stop treating PyTorch as magic and see how gradients actually flow. Skip it if you need a reference manual or a quick API cookbook—you’ll be watching hours of video to extract the insights.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.