Plachtaa/VALL-E-X
An open-source reproduction of Microsoft's VALL-E X zero-shot multilingual text-to-speech synthesis model with voice cloning capabilities.

Velocity · 7d
+7.6
★ / day
Trend
→steady
star history
This repository provides a trained implementation of VALL-E X, a zero-shot TTS model capable of synthesizing speech in multiple languages from just a 3-second enrollment recording, preserving the speaker’s voice characteristics and emotional tone. The model uses a GPT-style autoregressive decoder with an Encodec or Vocos neural codec decoder for audio generation.