showlab/Tune-A-Video
A method for one-shot fine-tuning of text-to-image diffusion models to enable text-to-video generation.

Velocity · 7d
+3.5
★ / day
Trend
→steady
star history
Tune-A-Video adapts pre-trained text-to-image diffusion models to video generation by fine-tuning on a single video-text pair. It extends 2D diffusion to temporal dimensions using sparse temporal attention. The method was published at ICCV 2023 and supports applications like video editing and text-driven video synthesis.