michaelzhang-ai/Text2Video
A deep-learning system that synthesizes talking-head videos from text input using a phoneme-pose dictionary and GAN-based generation.

Velocity · 7d
+0.2
★ / day
Trend
→steady
star history
This repository implements a text-driven video synthesis system for talking-head generation published at ICASSP 2022. The method builds a phoneme-pose dictionary and trains a generative adversarial network (GAN) to produce video from interpolated phoneme poses. It requires only a fraction of the training data needed by audio-driven approaches, offering more flexibility and faster preprocessing, training, and inference.