ali-vilab/videocomposer
A diffusion-based model that generates videos with controllable spatial and temporal patterns from text, sketches, reference videos, or handcrafted inputs.

VideoComposer is a controllable video diffusion model enabling compositional video synthesis with flexible motion controllability. It allows users to control spatial and temporal patterns within generated videos through various input forms including text descriptions, sketch sequences, reference videos, and handcrafted motions. The official implementation includes pretrained models, training code, and a Gradio UI, with released variants such as I2VGen-XL for higher-quality output.