DepthAnything/Video-Depth-Anything
Video depth estimation model that produces consistent depth maps across arbitrarily long videos using transformer architecture.

Video Depth Anything extends Depth Anything V2 to handle video sequences, enabling consistent depth estimation across long videos without compromising quality or generalization. It uses a transformer-based architecture and supports both relative and metric depth estimation modes, including streaming inference for real-time applications. The model offers faster inference and fewer parameters compared to diffusion-based alternatives while maintaining higher accuracy.