omriav/blended-diffusion
A diffusion model system that edits natural images based on text descriptions using CLIP guidance and spatially blended denoising.

Blended Diffusion enables local region-based edits on natural images using natural language descriptions and ROI masks. The method combines a pretrained CLIP model to guide edits toward user-provided text with a denoising diffusion probabilistic model to generate natural-looking results. It spatially blends noised versions of the input image with text-guided diffusion latents at progressive noise levels to seamlessly fuse edited regions with unchanged parts.