ai-forever/Kandinsky-2
A multilingual text-to-image latent diffusion model for generating, editing, and manipulating images from text descriptions.

Velocity · 7d
+2.1
★ / day
Trend
→steady
star history
Kandinsky 2 is a multilingual latent diffusion model that generates images from text prompts. It combines an XLM-RoBERTa text encoder, a CLIP vision encoder (ViT-bigG), and a latent diffusion U-Net to produce high-quality images. The model supports text-to-image generation, image-to-image translation, inpainting, outpainting, and ControlNet-guided generation.