← all repositories

facebookresearch/multimodal

A PyTorch library for training multimodal and vision-language models at scale.

1.7k stars Python ML FrameworksLanguage Models
multimodal
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

TorchMultimodal provides modular and composable building blocks including fusion layers, loss functions, and datasets for building multimodal models. It includes implementations of canonical state-of-the-art models like ALBEF and BLIP-2 with pretrained weights. The library enables researchers to replicate published models and serves as a foundation for future multimodal research combining content understanding and generative capabilities.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.