isl-org/DPT
A Vision Transformer architecture for dense prediction tasks like depth estimation and semantic segmentation.

Velocity · 7d
+1.2
★ / day
Trend
→steady
star history
DPT provides pre-trained Vision Transformer models for monocular depth estimation and semantic segmentation. The models use a hybrid architecture combining traditional convolutional layers with transformer encoders, outputting dense pixel-level predictions. The repository includes inference code and downloadable model weights for both tasks.