Is vall-e open source?

Yes — enhuiz/vall-e is open source, released under the MIT license.

What language is vall-e written in?

enhuiz/vall-e is primarily written in Python.

How popular is vall-e?

enhuiz/vall-e has 3k stars on GitHub.

Where can I find vall-e?

enhuiz/vall-e is on GitHub at https://github.com/enhuiz/vall-e.

← all repositories

enhuiz/vall-e

VALL-E’s unofficial PyTorch port is training-only for now

An unofficial PyTorch rebuild of Microsoft’s zero-shot speech synthesizer that hands you the training code but leaves the pretrained weights as homework.

★3k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is an independent PyTorch implementation of VALL-E, the neural codec language model that synthesizes speech from text using a short audio prompt. It replicates the full training pipeline—autoregressive generation for the first EnCodec quantizer and non-autoregressive filling for the rest—then decodes the tokens back to audio. It expects paired WAV and normalized text files, quantizes them via EnCodec, extracts phonemes, and trains both the AR and NAR models before synthesis.

The interesting bit

The project leans heavily on DeepSpeed for distributed training and implements details like AdaLN and sample-wise quantization level sampling for the NAR model, which suggests the author is chasing the paper’s architecture rather than simplifying it. That said, it is strictly a training framework right now: the provided Colab demo overfits a single utterance and the authors explicitly warn it is “not usable.”

Key highlights

Autoregressive and non-autoregressive models are both implemented and trainable
Built atop Meta’s EnCodec tokenizer for neural audio compression
Includes CLI tools for quantization, phoneme extraction, training, export, and synthesis
DeepSpeed integration for the trainer
EnCodec’s underlying license is CC-BY-NC 4.0, so generated audio quantization inherits non-commercial terms

Caveats

No pretrained checkpoint or public demos yet; the unchecked TODO item and Colab warning make it clear this is bring-your-own-weights
Only tested on Python 3.10.7, and DeepSpeed’s GPU requirements mean hardware compatibility is narrow
The provided Colab example overfits a single test utterance and is explicitly labeled unusable

Verdict

Researchers and TTS hackers who want to reproduce VALL-E from the ground up will find a complete training scaffold here. If you are looking for a ready-to-use zero-shot voice cloner, this is not it—yet.

Frequently asked

What is enhuiz/vall-e?: An unofficial PyTorch rebuild of Microsoft’s zero-shot speech synthesizer that hands you the training code but leaves the pretrained weights as homework.
Is vall-e open source?: Yes — enhuiz/vall-e is open source, released under the MIT license.
What language is vall-e written in?: enhuiz/vall-e is primarily written in Python.
How popular is vall-e?: enhuiz/vall-e has 3k stars on GitHub.
Where can I find vall-e?: enhuiz/vall-e is on GitHub at https://github.com/enhuiz/vall-e.