songweige/rich-text-to-image
A diffusion model system that uses rich text formatting—font size, color, style, footnotes—to control text-to-image generation.

Velocity · 7d
+0.7
★ / day
Trend
→steady
star history
This research project enables fine-grained control over AI image generation by leveraging formatting information from rich text documents. It extends Stable Diffusion and SD-XL with capabilities for explicit token reweighting, precise color rendering, local style control, and detailed region synthesis. The project includes a HuggingFace demo and an Automatic1111 WebUI extension for practical use.