TencentQQGYLab/ELLA
ELLA equips diffusion models with large language models to improve semantic alignment in text-to-image generation.

Velocity · 7d
+1.6
★ / day
Trend
→steady
star history
ELLA is a research project that combines diffusion models with LLMs to enhance semantic alignment in image generation. The approach allows text-to-image diffusion models to better understand and follow complex text prompts by integrating large language model capabilities. The repository also includes EMMA, a related technique that enables text-to-image models to accept multi-modal prompts.