← all repositories
ChenyangQiQi/FateZero

Zero-shot video editing by stealing your own attention maps

FateZero edits real-world videos with text prompts using pretrained diffusion models, no per-video training required.

1.2k stars Jupyter Notebook Image · Video · AudioCreative · Design
FateZero
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

What it does FateZero performs text-driven edits on real videos—style transfers, attribute swaps, even shape changes—using only a pretrained Stable Diffusion model. You provide a source video and a text prompt; it returns an edited version with (claimed) temporal consistency. No retraining, no manual masks.

The interesting bit The trick is attention-map recycling. During DDIM inversion, FateZero captures intermediate self- and cross-attention maps, then fuses them back during denoising to preserve structure and motion. It also blends self-attentions using a mask derived from cross-attention features to keep the source video from leaking through. A spatial-temporal attention tweak in the UNet tries to keep frames from drifting apart.

Key highlights

  • Three editing modes: style transfer, local attribute editing (e.g., “squirrel, carrot → rabbit, eggplant”), and shape editing via Tune-A-Video checkpoints
  • Zero-shot: no per-prompt training, no user-provided masks
  • Ships with Colab notebook and Hugging Face Space for quick experiments
  • Low-resource configs available for 16 GB GPUs (down from ~100 GB CPU / 12 GB GPU for 8 frames on a 3090)
  • ICCV 2023 Oral; code and data released for paper reproduction

Caveats

  • Memory appetite is real: default settings want 100 GB CPU RAM; the “low-cost” config is still a 16 GB GPU
  • Shape editing requires separate Tune-A-Video checkpoints (~10 GB downloads)
  • Full data + checkpoints run to ~100 GB; setup involves conda, xformers (noted as “not stable”), and manual model placement
  • Todo list still has “time & memory optimization” unchecked

Verdict Worth a look if you’re researching diffusion-based video editing or need a zero-shot baseline to beat. Practitioners should budget for hardware and patience—this is research code with research-code ergonomics, not a product.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.