MonoDepth in PyTorch: a lighter port of the classic depth estimator
An unofficial PyTorch reimplementation that swaps in ResNet backbones and batch norm to slim down the original TensorFlow monocular depth model.

What it does
Trains a neural network to estimate depth from a single image, using stereo image pairs for supervision without ground-truth depth labels. The model learns by comparing predicted disparities between left and right views. It targets the KITTI driving dataset and outputs disparity maps that can be converted to depth.
The interesting bit
The original MonoDepth used a bulkier encoder; this port switches to ResNet18 or ResNet50 with added batch normalization and a flexible feature extractor that can pull any torchvision ResNet variant with pretrained weights. The authors claim the goal is “more lightweight model for depth estimation with better accuracy” — though the README doesn’t quantify that accuracy improvement.
Key highlights
- Supports ResNet18, ResNet50, or any torchvision ResNet as encoder backbone
- Batch normalization added for training stability
- Pretrained torchvision weights available via flag
- Includes Jupyter notebook with training and testing workflows
- Provides pretrained ResNet18 model (200 epochs, full KITTI minus 7 validation sequences)
- Stereo-pair training, single-image inference
Caveats
- Stuck on PyTorch 0.4.1 and CUDA 9.1 — ancient by deep learning standards
- No quantitative benchmarks or comparison tables against original MonoDepth
- KITTI-only; no mention of generalization to other datasets
- ~175 GB raw dataset download required to get started
Verdict
Worth a look if you need a hackable PyTorch implementation of a well-cited depth estimation baseline and don’t mind modernizing the dependencies yourself. Skip it if you want state-of-the-art results or turnkey training on newer PyTorch versions.