enhuiz/vall-e
An unofficial PyTorch implementation of VALL-E, an audio language model for text-to-speech synthesis.

Velocity · 7d
+2.4
★ / day
Trend
→steady
star history
This repository provides a PyTorch reimplementation of VALL-E, a neural audio codec language model for text-to-speech. It uses the EnCodec tokenizer for audio quantization and supports training both autoregressive (AR) and non-autoregressive (NAR) model variants. The implementation includes data preparation tools for audio quantization and phoneme generation, along with DeepSpeed-based training infrastructure.