EvelynFan/FaceFormer
A Transformer-based neural network that synthesizes realistic 3D facial motions from speech audio.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
FaceFormer is an end-to-end Transformer architecture that autoregressively generates sequences of 3D facial meshes from audio input. Given a neutral face template and raw audio, it produces accurate lip movements and facial expressions. The implementation is in PyTorch and includes pretrained models for VOCASET and BIWI datasets.