YangLing0818/RPG-DiffusionMaster
A training-free text-to-image diffusion framework that leverages multimodal LLMs to recaption prompts and plan regional generation for high-quality image synthesis.

This repository implements RPG, a paradigm that combines multimodal LLMs (GPT-4, Gemini-Pro, miniGPT-4) with complementary regional diffusion to achieve state-of-the-art text-to-image generation and editing. The system acts as a prompt recaptioner and region planner, enabling flexible composition across arbitrary MLLM architectures and diffusion backbones. It supports high-resolution image generation through its regional approach to diffusion synthesis.