← all repositories

bytedance/LatentSync

End-to-end lip-sync method using audio-conditioned latent diffusion models built on Stable Diffusion.

LatentSync
Velocity · 7d
+11
★ / day
Trend
steady
star history

LatentSync enables automatic lip synchronization in videos given audio input. It leverages Whisper to convert audio to embeddings, which are integrated into a U-Net via cross-attention, and uses Stable Diffusion’s latent space for generation. The system concatenates reference and masked frames with noised latents as input, training with a one-step method to estimate clean latents from predicted noise. It supports both Chinese and English video content with temporal consistency improvements in recent versions.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.