google/break-a-scene
A diffusion model-based system that extracts multiple concepts from a single image and allows their recombination using text guidance.

Break-A-Scene is an official implementation of a SIGGRAPH Asia 2023 paper that learns distinct tokens for multiple concepts within a single image using segmentation masks. It enables re-synthesizing individual concepts or combinations in various contexts through natural language guidance, supporting applications like image variations, background extraction, and local editing by example. The implementation uses PyTorch and builds on diffusion models.