← all repositories

Fantasy-AMAP/fantasy-talking

A diffusion-transformer system that generates realistic talking portrait videos from audio input by synthesizing coherent facial motion.

1.6k stars Python Image · Video · Audio
fantasy-talking
Velocity · 7d
+3.7
★ / day
Trend
steady
star history

FantasyTalking produces photorealistic talking head videos driven by audio conditions. It leverages a diffusion transformer architecture (Wan2.1) as the base generative model with Wav2Vec for audio encoding. The system synthesizes coherent facial motions including lip movements, expressions, and head poses to create natural talking portraits. Published at ACM MM 2025.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.