← all repositories

jdh-algo/JoyHallo

JoyHallo is a deep learning model that generates talking face videos from audio input, specifically optimized for Mandarin speech.

JoyHallo
Velocity · 7d
+0.8
★ / day
Trend
steady
star history

JoyHallo is an audio-driven video generation model for creating Mandarin talking head videos. It uses a Chinese wav2vec2 model to extract audio features and employs a semi-decoupled structure to capture relationships among lip, expression, and pose features. The model was trained on 29 hours of Mandarin speech video data collected from JD Health employees, including diverse speaking styles and medical topics.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.