facebookresearch/ImageBind
Multi-modal embedding model from Meta AI that aligns images, text, audio, depth, thermal, and IMU data into a unified embedding space.

Velocity · 7d
+7.7
★ / day
Trend
→steady
star history
ImageBind is a PyTorch implementation of a multi-modal foundation model that learns a joint embedding space binding six different data modalities. The model enables emergent zero-shot classification across modalities and supports cross-modal retrieval, arithmetic composition of modalities, and cross-modal detection. Released with pretrained checkpoints, it was published as a CVPR 2023 highlighted paper.