jdh-algo/JoyVASA
A diffusion-based method for generating talking portrait and animal videos from audio, producing facial dynamics and head motion.

Velocity · 7d
+1.5
★ / day
Trend
→steady
star history
JoyVASA is a diffusion-based approach for audio-driven facial animation that generates realistic talking heads from audio input. It employs a decoupled facial representation framework with a two-stage pipeline: first extracting disentangled facial representations, then generating facial dynamics and head motion from audio. The method supports both human portraits and animal images, producing natural lip-sync and head movements.