EvolvingLMMs-Lab/Otter
Multi-modal LLM combining vision and language capabilities with instruction-following and in-context learning.

Velocity · 7d
+2.9
★ / day
Trend
→steady
star history
Otter is an open-source multi-modal foundation model based on OpenFlamingo (itself based on DeepMind’s Flamingo). It is trained on the MIMIC-IT dataset and designed for instruction-following and in-context learning across vision-language tasks. The project provides model checkpoints on HuggingFace and supports both image and video understanding capabilities through specialized variants.