← all repositories
Shilin-LU/TF-ICON

Drop a cartoon cat into an oil painting, no retraining required

TF-ICON uses a weirdly empty "exceptional prompt" to trick Stable Diffusion into inverting and compositing images across visual domains without touching a single weight.

TF-ICON
Velocity · 7d
+0.8
★ / day
Trend
steady
star history

What it does

TF-ICON takes a foreground object and a background image—say, a sketch of a bird and a photorealistic forest—and composites them so the object looks like it belongs. It works across domains (cartoon, oil painting, photorealistic) and crucially needs no fine-tuning, no custom dataset, no per-instance optimization. You provide images, masks, and a location; it handles the blending.

The interesting bit

The trick is the “exceptional prompt,” which the authors describe as containing no information. This empty prompt somehow stabilizes image inversion—turning real images back into latent representations—better than existing inversion methods. That inverted latent then becomes the clay for cross-domain sculpting. It’s a bit like using silence as a tuning fork.

Key highlights

  • Built on Stable Diffusion 2.1; needs ~20–23 GB VRAM
  • Two modes: cross for mismatched visual domains, same for photorealistic composites
  • Outperforms prior baselines on CelebA-HQ, COCO, and ImageNet per the paper
  • Ships with sample inputs and a full test benchmark via OneDrive
  • ICCV 2023; code is straightforward inference scripts, not a library

Caveats

  • The README doesn’t quantify “outperforms” with numbers; you’ll need the paper for actual metrics
  • No diffusers integration or Gradio demo yet—just CLI scripts
  • Foreground resolution “should not be too small,” but no hard threshold given

Verdict

Worth a look if you’re doing research in diffusion-based editing or need cross-domain compositing without training infrastructure. Skip it if you want a polished product API or have less than 20 GB of GPU memory.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.