Is ImageBind open source?

Yes — facebookresearch/ImageBind is an open-source project tracked on heatdrop.

What language is ImageBind written in?

facebookresearch/ImageBind is primarily written in Python.

How popular is ImageBind?

facebookresearch/ImageBind has 9.1k stars on GitHub.

Where can I find ImageBind?

facebookresearch/ImageBind is on GitHub at https://github.com/facebookresearch/ImageBind.

← all repositories

facebookresearch/ImageBind

Cross-Modal Search by Sound, Heat, or Motion

ImageBind crams images, text, audio, depth, thermal, and IMU data into a single vector space for zero-shot cross-modal retrieval and arithmetic.

★9.1k stars Python Image · Video · Audio Language Models ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

ImageBind is Meta’s PyTorch implementation of a CVPR 2023 highlighted paper that learns one shared embedding across six input types: vision, text, audio, depth, thermal, and IMU data. A photo of a dog, its bark, and the text “A dog” all map to nearby vectors, which means the model can perform zero-shot cross-modal retrieval and detection without task-specific fine-tuning. The repository includes a pretrained imagebind_huge checkpoint and data loaders for each modality.

The interesting bit

The README claims the model enables “emergent” cross-modal applications out-of-the-box—audio-to-image retrieval or adding vectors across modalities—without additional fine-tuning. Whether that robustly holds on messy real-world data is left as an exercise for the reader.

Key highlights

Joint embedding for six modalities: images, text, audio, depth, thermal, and IMU data.
Pretrained imagebind_huge checkpoint with reported zero-shot scores (ImageNet-1k 77.7, ESC audio 66.9, K400 50.0, etc.).
Emergent capabilities: cross-modal retrieval, arithmetic on embeddings, and cross-modal detection/generation.
Runs fully offline after downloading the checkpoint; no external API dependency.
Code and weights are under CC-BY-NC 4.0.

Caveats

Strictly non-commercial license (CC-BY-NC 4.0) rules out most product use.
The README is vague on training compute, dataset scale, and how well the emergent behaviors generalize beyond the reported benchmarks.

Verdict

Grab it if you are prototyping multimodal research tools or need to align sensory streams without building custom fusion layers. Pass if you need a commercial license or if your stack requires modalities outside the six supported.

Frequently asked

What is facebookresearch/ImageBind?: ImageBind crams images, text, audio, depth, thermal, and IMU data into a single vector space for zero-shot cross-modal retrieval and arithmetic.
Is ImageBind open source?: Yes — facebookresearch/ImageBind is an open-source project tracked on heatdrop.
What language is ImageBind written in?: facebookresearch/ImageBind is primarily written in Python.
How popular is ImageBind?: facebookresearch/ImageBind has 9.1k stars on GitHub.
Where can I find ImageBind?: facebookresearch/ImageBind is on GitHub at https://github.com/facebookresearch/ImageBind.