← all repositories

JIA-Lab-research/MGM

A multi-modality vision-language model supporting 2B to 34B parameter LLMs with image understanding, reasoning, and generation capabilities.

MGM
Velocity · 7d
+4.1
★ / day
Trend
steady
star history

Mini-Gemini is a vision-language model framework that extends large language models with multi-modal image understanding and generation capabilities. The framework supports a range of dense and Mixture of Experts (MoE) LLMs from 2B to 34B parameters, enabling image comprehension, visual reasoning, and image generation tasks. Built on the LLaVA architecture, it provides model weights, training code, and inference capabilities through HuggingFace integration.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.