← all repositories

UCSB-AI/MiniGPT-5

A multimodal LLM that jointly generates coherent text and images using generative vokens as bridging tokens.

MiniGPT-5
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

MiniGPT-5 is an interleaved vision-and-language generation model that uses a novel concept of generative vokens to harmonize image and text output. It employs a two-staged training strategy that does not require comprehensive image descriptions, making training more efficient. The approach incorporates classifier-free guidance to improve voken effectiveness for image generation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.