autonomousvision/unimatch
A unified transformer-based model for optical flow, stereo matching, and monocular depth estimation achieving state-of-the-art results.

UniMatch proposes a unified architecture that jointly handles optical flow estimation, stereo matching, and depth prediction using cross-attention and correlation mechanisms. The model leverages a transformer backbone with global matching to learn dense correspondences across image pairs. It achieves 1st place on Sintel, Middlebury, and Argoverse benchmarks, outperforming task-specific approaches while using a single model architecture.