garibida/cross-image-attention
A zero-shot appearance transfer method using cross-image attention in stable diffusion to combine the structure of one image with the style of another.

This is the official implementation of a SIGGRAPH 2024 research paper on cross-image appearance transfer. The method leverages self-attention layers in pre-trained text-to-image diffusion models to establish semantic correspondences between two images, transferring visual appearance without requiring any training or optimization. By combining queries from a structure image with keys and values from an appearance image during the denoising process, it generates new images that retain the structure of one input while adopting the visual style of another.