← all repositories

salesforce/LAVIS

A one-stop deep learning library for building, training, and evaluating vision-language and multimodal AI models.

11.2k stars Jupyter Notebook ML FrameworksImage · Video · Audio
LAVIS
Velocity · 7d
+8.1
★ / day
Trend
steady
star history

LAVIS is a comprehensive library for language-vision intelligence developed by Salesforce AI Research. It provides unified interfaces for working with multimodal models including vision-language transformers, image captioning, visual question answering, and cross-modality frameworks built on frozen LLMs. The library supports integration of multiple modalities (image, video, audio, 3D) through reusable components for datasets, models, training pipelines, and evaluation benchmarks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.