X-PLUG/mPLUG-Owl
A family of multi-modal large language models that process images, video, and text for visual recognition and dialogue tasks.

Velocity · 7d
+2.2
★ / day
Trend
→steady
star history
mPLUG-Owl is a modular multi-modal LLM family supporting image and video understanding. The series spans three generations: mPLUG-Owl, mPLUG-Owl2 (CVPR 2024 Highlight), and mPLUG-Owl3 for long image-sequence comprehension. Models are implemented in PyTorch and distributed via HuggingFace with training and evaluation code provided.