Francis-Rings/StableAvatar
A video diffusion model that generates infinite-length avatar videos from a reference image and audio input.

Velocity · 7d
+4.1
★ / day
Trend
→steady
star history
StableAvatar is an end-to-end video diffusion transformer for synthesizing high-quality, infinite-length avatar videos driven by audio. It takes a reference image and audio as conditioning inputs to generate synchronized talking head videos without post-processing. The model uses a diffusion transformer architecture to handle both visual generation and temporal consistency across long video sequences.