A 2017 PyTorch port of DeepLab-ResNet that still trains on Python 2.7
Academic code for multi-scale semantic segmentation, ported from Caffe with the scars to prove it.

What it does
Implements the DeepLab-ResNet v2 architecture for semantic segmentation, computing losses across three image scales (1×, 0.75×, 0.5×) plus a merged output. It was built for a 2017 ACM MM paper on sketch parsing, but the model itself is standard stuff: ResNet backbone, atrous convolutions, PASCAL VOC training.
The interesting bit
The author went to unusual lengths to replicate Caffe behavior in PyTorch—shared weights across scales, poly learning rate decay, iter_size for effective batch scaling, even fixing batchnorm running stats to match use_global_stats = True. The README candidly admits where things diverge: no CRF post-processing, boundary labels (255) get merged into background instead of ignored, and the PyTorch-trained model scores 72.40% mIOU versus 75.54% from the original Caffe weights.
Key highlights
- Converts MS COCO-pretrained Caffe weights via included surgery script
- Supports custom datasets with contiguous labels (0 to N-1)
- Random scale augmentation per iteration (0.5–1.3) versus Caffe’s fixed 4 scales
- Last layer gets 10× learning rate; ~11.9 GB GPU memory on Titan X
- Training time: roughly 3.5 hours
Caveats
- Python 2.7 only; no indication of Python 3 support
- No CRF post-processing, and the
ignore_labelparameter remains unimplemented - The higher 78.48% figure in earlier versions used a non-standard mIOU calculation (per-image average, not the authors’ method)
Verdict
Worth a look if you’re specifically replicating 2016–2017 DeepLab papers or need a reference PyTorch training setup with pedantic Caffe parity. Skip it if you want modern DeepLab v3+, Python 3, or production-ready code—this is research scaffolding with honest documentation.