A deep learning stack that fits in your head
tinygrad is what happens when you keep PyTorch's ergonomics but make the entire compiler small enough to read in an afternoon.
What it does
tinygrad is a full end-to-end deep learning framework: tensors with autograd, an IR and compiler that fuses and lowers kernels, JIT execution, plus nn, optim, and datasets for real training. It runs on everything from CPU and CUDA to Metal, AMD, Qualcomm, and WebGPU. The pitch is simple: PyTorch-like API, but every layer of the stack is visible and hackable.
The interesting bit
The project borrows from three heavyweights—PyTorch’s feel, JAX’s functional IR-based autodiff, TVM’s scheduling and codegen—then strips away the parts you can’t read on a plane. The “laziness” demo is telling: a matmul written in eager style gets fused into a single kernel, and you can toggle DEBUG=3 or DEBUG=4 to watch the compiler think. That’s the transparency the README keeps selling.
Key highlights
- ~25 low-level ops are all a new accelerator needs to implement
TinyJitcaptures and replays kernels at function scope- BEAM search over kernels for scheduling, plus process-replay tests to catch compiler regressions
- Contributing guide is unusually blunt: no code golf, no whitespace PRs, benchmark your “speedups,” and AI agents must include the word ORANGE in commits
- Cash bounties for improvements, with a stated preference for 3-line features over 300-line ones
Caveats
- Not 1.0 yet; the README admits bugs are still being found
- No full
vmap/pmapequivalent yet, so JAX migrants may miss some functional transforms - Code outside
tinygrad/core is “not well tested”—treatextra/as experimental
Verdict Grab this if you want to understand how a modern DL stack actually works, or if you’re targeting an obscure accelerator and need to write a ~25-op backend. Skip it if you need production stability, full JAX-style transforms, or a large ecosystem of prebuilt models.