CutMix: Stop deleting pixels, start swapping them
A data augmentation trick that pastes patches between training images instead of blanking them out, mixing labels by area to keep training efficient.

What it does CutMix is a regularization technique for training image classifiers. Instead of masking out random image regions with black pixels or noise (Cutout), it cuts a patch from one training image and pastes it onto another. The ground-truth labels get mixed proportionally to the patch area, so no training pixels go to waste.
The interesting bit The authors frame this as fixing an inefficiency: previous regional dropout methods literally throw away information. CutMix keeps pixel utilization at 100% while still forcing the network to attend to less obvious object parts. The transfer learning table is the quiet flex — CutMix-pretrained ResNet-50 improves downstream detection and captioning scores, where Mixup and Cutout pretrained models actually hurt or barely help.
Key highlights
- ICCV 2019 oral; official PyTorch implementation from NAVER Clova AI
- CIFAR-100 PyramidNet-200: 16.45% → 14.23% top-1 error (13.81% with Shakedrop)
- ImageNet ResNet-50: 23.68% → 21.40% top-1 error
- Pretrained models provided via Dropbox for PyramidNet-200, ResNet-{50,101,152}, ResNeXt-101
- Includes training and test scripts with GPU count recommendations (2 for CIFAR, 4 for ImageNet)
- Third-party TensorFlow implementation linked
Caveats
- Pretrained models hosted on Dropbox, not GitHub releases or Hugging Face
- Code is based on PyTorch ImageNet example and PyramidNet-PyTorch — not a standalone library, more of a research reproduction
- No pip install; you clone and run
train.pywith explicit flags
Verdict
Worth studying if you’re still using Cutout or building custom augmentation pipelines. Skip if you need a drop-in torchvision.transforms replacement — this is research code with hardcoded network configs.