← all repositories
LiheYoung/Depth-Anything

Depth estimation from a single photo, trained on 62 million unlabeled images

A foundation model that learns depth from unlabeled data at internet scale, then beats specialized models on standard benchmarks.

8.1k stars Python Computer Vision
Depth-Anything
Velocity · 7d
+9.3
★ / day
Trend
steady
star history

What it does

Depth Anything turns a single 2D image into a depth map, telling you roughly how far each pixel is from the camera. It comes in three sizes (Small, Base, Large) and runs on images or video via a simple CLI or a Gradio demo. The project also ships fine-tuned variants for metric depth (actual distances in meters) and a retrained ControlNet for depth-conditioned image generation.

The interesting bit

The model was trained on 1.5 million labeled images plus 62 million unlabeled ones — a 40x unlabeled boost. The authors claim this scale, not architectural wizardry, is what makes it generalize. The Small model (24.8M parameters) reportedly outperforms MiDaS v3.1 BEiT-L-512 (345M parameters) on KITTI, NYUv2, and several other benchmarks, despite being 14x smaller. The Large model is even further ahead.

Key highlights

  • Three model sizes with inference times from 3ms (Small, TensorRT on RTX 4090) to 20ms (Large, V100)
  • Relative depth out of the box; metric depth via fine-tuning on NYUv2 or KITTI
  • Encoder can be repurposed for downstream tasks — 86.2 mIoU on Cityscapes semantic segmentation
  • Hugging Face transformers integration: depth prediction in 3 lines of code
  • Community ports to ONNX, TensorRT, ComfyUI, and Stable Diffusion WebUI ControlNet

Caveats

  • The README notes that V100/A100 timing numbers exclude pre/post-processing, while the RTX 4090 TensorRT numbers include them — direct comparison requires care
  • The project now points users to Depth Anything V2 as the latest version, so this repo is effectively the legacy release

Verdict

Worth a look if you need drop-in depth estimation for robotics, 3D reconstruction, or generative pipelines. Skip if you already migrated to V2 or need guaranteed metric accuracy without fine-tuning.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.