dome272/Paella
A text-to-image diffusion model that generates high-fidelity images in fewer than 10 sampling steps.

Velocity · 7d
+0.6
★ / day
Trend
→steady
collecting data…
star history
Paella is a text-to-image diffusion model that generates high-fidelity images in under 10 sampling steps. It operates on a compressed and quantized latent space, conditions on CLIP embeddings, and achieves fast inference under 500ms per image. The model supports text-conditional generation, latent interpolation, and image manipulation tasks including inpainting, outpainting, and structural editing.