← all repositories
lmb-freiburg/demon

Teaching neural nets to see like 1981, but with more CUDA

A ConvNet that learns to estimate depth and camera motion from just two images, replacing classical geometry with learned intuition.

586 stars Python Computer Vision
demon
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

DeMoN takes two photographs of the same scene and predicts two things: how far away everything is (depth), and how the camera moved between shots. It’s the classic “structure from motion” two-view problem, but solved with a convolutional network rather than the hand-tuned geometric pipelines of old.

The interesting bit

The authors cheekily borrow their subtitle from a 1981 paper by H. C. Longuet-Higgins — the foundational work on two-view reconstruction — then note that their network learns those “complex geometric relations” implicitly from data. The old guard did it with epipolar geometry and matrix decompositions; DeMoN does it with backprop and enough training data.

Key highlights

  • Estimates both depth maps and relative camera pose (rotation + translation) jointly
  • Published at CVPR 2017 by the Freiburg computer vision lab
  • Ships with Docker support and a pretrained weights download script
  • Includes evaluation code and (work-in-progress) TensorFlow training code ported from the original Caffe implementation
  • Depends on a custom ops library (lmbspecialops) included as a submodule

Caveats

  • Dependency stack is frozen in 2017 amber: TensorFlow 1.4, CUDA 8, Python 3.5, VTK 7.1
  • VTK’s python3 interface requires building from source or hunting through Anaconda; the binary release won’t cut it
  • The TensorFlow training code is explicitly noted as “work in progress” and not identical to the original Caffe model
  • Dataset download scripts had a bug where rgbd-prefixed files leaked test samples; fixed versions use rgbd_bugfix prefix

Verdict

Worth a look if you’re researching learned multi-view geometry or need a baseline for depth-from-stereo-with-motion. Skip it if you need production-ready code on modern TensorFlow — this is a research artifact with the dependency scars to prove it.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.