← all repositories

Plachtaa/VALL-E-X

An open-source reproduction of Microsoft's VALL-E X zero-shot multilingual text-to-speech synthesis model with voice cloning capabilities.

7.9k stars Python Image · Video · Audio
VALL-E-X
Velocity · 7d
+7.6
★ / day
Trend
steady
star history

This repository provides a trained implementation of VALL-E X, a zero-shot TTS model capable of synthesizing speech in multiple languages from just a 3-second enrollment recording, preserving the speaker’s voice characteristics and emotional tone. The model uses a GPT-style autoregressive decoder with an Encodec or Vocos neural codec decoder for audio generation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.