Depth from a single photo, no stereo rig required
A 2016 paper that still gets cited: predict depth maps from plain RGB using fully convolutional ResNets, with pretrained weights for indoor and outdoor scenes.

What it does
Feed it one ordinary photograph and it estimates how far away each pixel is. The repo ships trained models for two benchmarks—NYU Depth v2 for indoor scenes and Make3D for outdoor—plus a one-liner python predict.py to run inference on your own images.
The interesting bit
The authors treat upsampling as a residual learning problem, not an afterthought. They interleave feature maps instead of using slower deconvolution variants, which in 2016 was enough to beat Eigen & Fergus on NYU by a healthy margin (relative error 0.127 vs 0.158). The TensorFlow port was built by converting Caffe weights with ethereon’s tool—so this is also a snapshot of how research code migrated ecosystems mid-decade.
Key highlights
- Pretrained ResNet-UpProj models in both MatConvNet and TensorFlow formats
- One-command inference via
predict.pywith a.ckptcheckpoint - Evaluation scripts auto-download test data and models (budget ~5 GB disk)
- Quantitative results reported against contemporary CVPR/ICCV methods
- BSD-licensed, with a ready-made BibTeX entry
Caveats
- The Make3D TensorFlow model is marked “(soon)” and, per the README, still is
- MatConvNet path must be hand-edited in
.mfiles before evaluation runs - Code targets 2016-era TensorFlow and MatConvNet 1.0-beta20; modern environments may need coaxing
Verdict
Worth a look if you need a baseline depth predictor or are studying how early FCN papers handled upsampling. Skip it if you want training code, real-time performance, or native PyTorch—the repo is inference-only and the frameworks show their age.