← all repositories

YangLing0818/RPG-DiffusionMaster

A training-free text-to-image diffusion framework that leverages multimodal LLMs to recaption prompts and plan regional generation for high-quality image synthesis.

1.8k stars Jupyter Notebook Image · Video · AudioLanguage Models
RPG-DiffusionMaster
Velocity · 7d
+2.1
★ / day
Trend
steady
star history

This repository implements RPG, a paradigm that combines multimodal LLMs (GPT-4, Gemini-Pro, miniGPT-4) with complementary regional diffusion to achieve state-of-the-art text-to-image generation and editing. The system acts as a prompt recaptioner and region planner, enabling flexible composition across arbitrary MLLM architectures and diffusion backbones. It supports high-resolution image generation through its regional approach to diffusion synthesis.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.