← all repositories

michaelzhang-ai/Text2Video

A deep-learning system that synthesizes talking-head videos from text input using a phoneme-pose dictionary and GAN-based generation.

441 stars Python Image · Video · Audio
Text2Video
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

This repository implements a text-driven video synthesis system for talking-head generation published at ICASSP 2022. The method builds a phoneme-pose dictionary and trains a generative adversarial network (GAN) to produce video from interpolated phoneme poses. It requires only a fraction of the training data needed by audio-driven approaches, offering more flexibility and faster preprocessing, training, and inference.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.