← all repositories

CyberAgentAILab/TANGO

A diffusion model that synthesizes realistic gesture videos from speech audio through hierarchical audio-motion embedding.

1.2k stars Python Image · Video · Audio
TANGO
Velocity · 7d
+2.0
★ / day
Trend
steady
star history

TANGO generates co-speech gesture videos by mapping audio features to body motion using hierarchical audio-motion embedding and diffusion interpolation. The model takes speech input and produces corresponding gesture animations, enabling video reenactment with realistic body language synchronized to audio.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.