One model, many video edits: swap, move, expand, animate
VACE unifies video generation and editing tasks so you don't need a separate tool for every tweak.

What it does VACE is a video creation and editing framework built on top of Wan2.1 and LTX-Video. It handles reference-to-video generation, video-to-video editing, and masked video-to-video editing through a single pipeline. The project ships with preprocessors, inference scripts, and Gradio demos for tasks like depth editing, inpainting, and object swapping.
The interesting bit Rather than chaining disparate models, VACE lets you compose operations—Move-Anything, Swap-Anything, Animate-Anything—through one interface. It exposes both an end-to-end CLI that hides the plumbing and separate preprocessing / inference stages when you need control over masks, reference images, or bounding boxes.
Key highlights
- Supports Wan2.1 (1.3B and 14B) and LTX-Video (0.9) backends with Apache-2.0 and RAIL-M licensing respectively
- Multi-GPU inference via
torchrunwith FSDP and sequence parallelism for the 14B model at 720p - Preprocessing pipeline converts raw inputs into
src_video,src_mask, andsrc_ref_imageswith task-specific annotators - Gradio demos for both preprocessing and model inference
- ICCV 2025 acceptance; models and benchmark data hosted on HuggingFace and ModelScope
Caveats
- LTX-Video and English Wan2.1 users need prompt extension for full performance, and the README warns that extended prompts may drift from the actual video content during editing tasks
- LTX-Video and Wan2.1 dependencies may conflict; the README explicitly flags this
- Optimal results require specific resolution ranges; arbitrary input resolutions are supported but not recommended
Verdict Worth a look if you’re building video editing workflows and want a unified interface rather than a grab bag of separate tools. Skip if you need a lightweight, dependency-free solution—this is a heavy research codebase with multiple large model backends.