timm: the kitchen-sink of PyTorch vision models
A single library that collects, trains, and exports nearly every image backbone worth using—so you don't have to reimplement them yourself.

What it does
timm (PyTorch Image Models) is a comprehensive collection of image encoder implementations for PyTorch, bundled with training, evaluation, inference, and export scripts plus pretrained weights. It covers ResNet, EfficientNet, Vision Transformers, ConvNeXt, MobileNet variants, and dozens more—essentially the standard catalog of computer vision backbones.
The interesting bit
The project doesn’t just port papers; it actively maintains them. Recent releases add DINOv3 support, a custom Muon optimizer implementation with AdamW fallbacks, NaFlexViT for variable-resolution/aspect images, and even security-hardened checkpoint loading. The changelog reads like a running diary of CV research adoption.
Key highlights
- 36K+ stars, suggesting broad community reliance
- Pretrained weights hosted on HuggingFace Hub with explicit license fields
- Includes optimizers beyond stock PyTorch: Muon, AdaMuon, NAdaMuon, AdamP, SGDP, and others
- NaFlexViT supports variable patch sizes, aspect ratios, and factorized position embeddings
- Benchmark CSVs provided for inference timing on RTX 4090/5090/Pro 6000 with PyTorch 2.9.1
- Maintains compatibility range from PyTorch 1.13 + Python 3.10 up to PyTorch 2.9.1 + Python 3.13
Caveats
- The README is mostly a changelog; finding specific model documentation requires digging into the “Getting Started” links
- Recent maintenance releases note the original author’s departure from Hugging Face, though development continues
- Occasional compatibility breaks are documented (e.g., QKV/MLP bias fix in January 2026, MobileNet-v5 stem bias change)
Verdict
Essential if you need to compare, fine-tune, or deploy vision backbones without maintaining your own model zoo. Overkill if you’re committed to a single architecture and already have a training pipeline locked down.