tencent-ailab/IP-Adapter
A 22M-parameter adapter enabling pretrained text-to-image diffusion models to generate images from image prompts.

IP-Adapter is a lightweight adapter designed to equip pretrained text-to-image diffusion models with image prompt capability. It uses a decoupled cross-attention mechanism to separately process image prompt features alongside text features, enabling multimodal image generation. The adapter achieves comparable performance to fine-tuned image prompt models while remaining lightweight and generalizable to custom models and controllable generation tools.