ShirAmir/dino-vit-features
Official implementation extracting deep ViT features from DINO as dense visual descriptors for co-segmentation and semantic correspondence tasks.

The repository provides a PyTorch implementation for using pre-trained DINO Vision Transformer features as dense patch descriptors. It extracts features from a self-supervised ViT model and applies them to real-world vision tasks including co-segmentation, part segmentation, and point correspondence. The approach uses lightweight methods like clustering or binning on deep ViT features rather than task-specific learned components.