UCSB-AI/MiniGPT-5
A multimodal LLM that jointly generates coherent text and images using generative vokens as bridging tokens.

Velocity · 7d
+0.9
★ / day
Trend
→steady
star history
MiniGPT-5 is an interleaved vision-and-language generation model that uses a novel concept of generative vokens to harmonize image and text output. It employs a two-staged training strategy that does not require comprehensive image descriptions, making training more efficient. The approach incorporates classifier-free guidance to improve voken effectiveness for image generation.