PKU-YuanGroup/MoE-LLaVA
A multi-modal large language model that uses Mixture-of-Experts architecture to efficiently handle vision-language tasks.

Velocity · 7d
+2.6
★ / day
Trend
→steady
star history
MoE-LLaVA is a vision-language model that applies Mixture-of-Experts techniques to improve efficiency and performance in handling multi-modal inputs. The project implements sparse activation mechanisms where only a subset of expert networks are engaged per forward pass, enabling larger model capacity without proportional compute cost. It provides training code, pre-trained checkpoints, and interactive demos via HuggingFace and Replicate.