← all repositories

HumanAIGC/EMO

An audio-driven portrait video generation system using a diffusion model to create expressive talking-head videos from audio input.

EMO
Velocity · 7d
+9.2
★ / day
Trend
steady
star history

EMO is a diffusion-based approach that synthesizes expressive portrait videos directly from audio input without requiring explicit 3D representations or intermediatelandmarks. The model generates talking-head videos with natural facial expressions, head movements, and synchronized lip motion by learning audio-visual correspondences through a weak-conditioning framework. It was published at ECCV 2024 by researchers from Alibaba Group’s Institute for Intelligent Computing.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.