TonyLianLong/LLM-groundedDiffusion
Research project that uses LLMs as prompt parsers to enhance Stable Diffusion's ability to understand and generate images from complex text descriptions.

LLM-grounded Diffusion (LMD) enhances text-to-image generation by using a Large Language Model to parse user prompts into structured intermediate representations (such as image layouts) before feeding them to Stable Diffusion. This approach improves the model’s ability to handle complex, compositional, and spatially specific prompts. The project includes training code, evaluation benchmarks, and has been integrated into the Hugging Face diffusers library as LMD+.