← all repositories

songweige/rich-text-to-image

A diffusion model system that uses rich text formatting—font size, color, style, footnotes—to control text-to-image generation.

rich-text-to-image
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

This research project enables fine-grained control over AI image generation by leveraging formatting information from rich text documents. It extends Stable Diffusion and SD-XL with capabilities for explicit token reweighting, precise color rendering, local style control, and detailed region synthesis. The project includes a HuggingFace demo and an Automatic1111 WebUI extension for practical use.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.