← all repositories

isl-org/DPT

A Vision Transformer architecture for dense prediction tasks like depth estimation and semantic segmentation.

2.3k stars Python Computer VisionML Frameworks
DPT
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

DPT provides pre-trained Vision Transformer models for monocular depth estimation and semantic segmentation. The models use a hybrid architecture combining traditional convolutional layers with transformer encoders, outputting dense pixel-level predictions. The repository includes inference code and downloadable model weights for both tasks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.