← all repositories
DepthAnything/Depth-Anything-V2

Depth estimation that actually runs on your laptop

A monocular depth model that trades Stable Diffusion's bulk for DINOv2's backbone, shipping four sizes from 25M to 1.3B parameters.

8.2k stars Python Computer Vision
Depth-Anything-V2
Velocity · 7d
+11
★ / day
Trend
steady
star history

What it does

Depth Anything V2 turns a single RGB image into a depth map. No stereo rig, no LiDAR, no fuss. The repo ships four model sizes (Small through Giant), plus scripts for batch images, video, and a local Gradio demo. You can also load it via Hugging Face Transformers if you don’t want the full checkout.

The interesting bit

The authors accidentally used the wrong DINOv2 features in V1—last four layers instead of intermediate ones—and fixed it in V2. They admit this “did not improve details or accuracy,” which is either refreshing honesty or a testament to how robust the underlying approach already was. The real win is speed: they claim faster inference and fewer parameters than SD-based depth models, with four scales to match your GPU budget.

Key highlights

  • Four checkpoints: 24.8M, 97.5M, 335.3M, and 1.3B parameters (Giant “coming soon”)
  • Native video support with larger models buying you better temporal consistency
  • Metric depth fine-tuning available in a separate subdirectory
  • Apple Core ML, TensorRT, ONNX, and Android ports already exist in the community
  • Small model is Apache-2.0; Base/Large/Giant are CC-BY-NC-4.0 (non-commercial)

Caveats

  • The 1.3B Giant model is still unreleased
  • Hugging Face Transformers integration exists but predictions differ slightly from the native path due to OpenCV vs. Pillow upsampling

Verdict

Grab this if you need off-the-shelf depth maps without orchestrating a diffusion pipeline. Skip it if you need guaranteed metric accuracy out of the box—fine-tuning or LiDAR prompting (see their Prompt Depth Anything follow-up) is required for that.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.