← all repositories
facebookresearch/dinov3

Meta's vision model that sees forests and street markets without fine-tuning

DINOv3 is a family of self-supervised vision backbones designed to produce high-quality dense features for everything from semantic segmentation to satellite canopy mapping, often beating task-specialized models out of the box.

10.6k stars Jupyter Notebook Computer VisionML Frameworks
dinov3
Velocity · 7d
+35
★ / day
Trend
steady
star history

What it does

DINOv3 provides pretrained vision transformers and ConvNeXt backbones that output dense, high-resolution features for images. The models come in sizes from 21M to 6.7B parameters, trained on either web-scale data (LVD-1689M) or satellite imagery (SAT-493M). Meta ships reference PyTorch code plus adapters for linear probing on tasks like semantic segmentation (ADE20K), depth estimation (NYUv2-Depth), and canopy height mapping.

The interesting bit

The pitch is “without fine-tuning” — these are foundation models in the original sense, meant to work as frozen feature extractors. The CHMv2 release is a nice flex: a 7B-parameter ViT pretrained on satellite data, repurposed for global forest canopy height mapping, with weights on Hugging Face and integration into the Transformers library.

Key highlights

  • ViT and ConvNeXt variants from tiny (21M) to 7B parameters, all trained with self-supervised distillation
  • Two pretraining domains: general web images (LVD-1689M) and satellite imagery (SAT-493M)
  • Supported by PyTorch Hub, Hugging Face Transformers (≥4.56.0), and timm (≥1.0.20)
  • Released task code: linear segmentation, depth estimation, and canopy height inference
  • Model weights require an access request via Meta’s download portal; wget recommended over browser downloads

Caveats

  • Weight downloads are gated behind a request form, not directly fetchable
  • The README is heavy on model release announcements and light on training methodology or architecture details

Verdict

Worth a look if you need strong frozen visual features and don’t want to fine-tune a CLIP variant. Skip if you were hoping for an open-weights, no-gatekeeping drop-in replacement — the access friction is real.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.