← all repositories

BAAI-DCAI/Bunny

A family of lightweight multimodal models combining vision encoders with language backbones like Llama-3 and Phi-3.

Bunny
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

Bunny provides a suite of multimodal language models that combine plug-and-play vision encoders (EVA-CLIP, SigLIP) with language backbones (Llama-3-8B, Phi-3-mini, StableLM-2, Qwen1.5, etc.). The models support high-resolution image input up to 1152x1152 and aim to deliver competitive performance against larger MLLMs while maintaining smaller parameter counts. Training data is curated from broad sources to compensate for model size reductions.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.