DeepLab v2 escapes Caffe, lands in PyTorch
A clean PyTorch port of the classic segmentation model that lets you reuse official Caffe weights without touching Caffe itself.

What it does
This is an unofficial PyTorch re-implementation of DeepLab v2 with a ResNet-101 backbone, targeting the COCO-Stuff and PASCAL VOC datasets. It handles training, evaluation, CRF post-processing, and even webcam demos. The official Caffe pre-trained weights can be converted and loaded directly—no Caffe build required.
The interesting bit
The author didn’t just rewrite the model; they solved the practical headache of porting Caffe weights. A convert.py script transforms the authors’ official .caffemodel files into PyTorch-compatible format. DeepLab v3/v3+ variants are also included in the codebase, though the README notes they are “not tested.”
Key highlights
- Matches or slightly exceeds official COCO-Stuff 10k metrics (Mean IoU 34.8 vs. 34.4 without CRF)
- PASCAL VOC 2012 Mean IoU reaches 76.65 without CRF, 77.93 with CRF—both competitive with the official 76.35/77.69
torch.hubone-liner loading supported:torch.hub.load("kazuto1011/deeplab-pytorch", "deeplabv2_resnet101", pretrained='cocostuff164k', n_classes=182)- Gradient accumulation workaround for GPU memory: effective batch size of 10 via two iterations of 5 samples (tested at ~11.2 GB on a single Titan X)
- Live webcam demo and single-image inference scripts included
Caveats
- DeepLab v3/v3+ models are present but explicitly “not tested”
- Batch normalization layers are frozen during training; training them (required for v3/v3+) needs an extra dependency (
torch-encoding) for synchronized batch norm - The default environment pins
python=3.6andcudatoolkit=10.2, so you’ll likely need to edit the conda config for modern setups - COCO-Stuff 164k training runs for 100,000 iterations—patience required
Verdict
Worth a look if you need a battle-tested DeepLab v2 baseline in PyTorch, especially for COCO-Stuff or PASCAL VOC reproduction work. Skip if you want a maintained, modern segmentation framework—this is a research reproduction with 2019-era dependencies.