facebookresearch/multimodal
A PyTorch library for training multimodal and vision-language models at scale.

Velocity · 7d
+1.1
★ / day
Trend
→steady
star history
TorchMultimodal provides modular and composable building blocks including fusion layers, loss functions, and datasets for building multimodal models. It includes implementations of canonical state-of-the-art models like ALBEF and BLIP-2 with pretrained weights. The library enables researchers to replicate published models and serves as a foundation for future multimodal research combining content understanding and generative capabilities.