Is tacotron2 open source?

Yes — NVIDIA/tacotron2 is open source, released under the BSD-3-Clause license.

What language is tacotron2 written in?

NVIDIA/tacotron2 is primarily written in Jupyter Notebook.

How popular is tacotron2?

NVIDIA/tacotron2 has 5.3k stars on GitHub.

Where can I find tacotron2?

NVIDIA/tacotron2 is on GitHub at https://github.com/NVIDIA/tacotron2.

← all repositories

NVIDIA/tacotron2

Tacotron 2 without the WaveNet bottleneck

NVIDIA's PyTorch port of Tacotron 2 swaps WaveNet for faster vocoders and adds distributed, mixed-precision training to make speech synthesis practical on their GPUs.

★5.3k stars Jupyter Notebook Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is NVIDIA’s PyTorch implementation of the Tacotron 2 text-to-speech model, which converts text into mel spectrograms. It deliberately omits WaveNet, expecting you to pair it with a separate mel-to-audio decoder such as NVIDIA’s WaveGlow or nv-wavenet. It trains on the LJSpeech dataset and ships with pre-trained checkpoints for both the spectrogram predictor and a compatible vocoder.

The interesting bit

The value is mostly engineering hygiene. NVIDIA added distributed multi-GPU training and automatic mixed precision via Apex, turning a research demo into something that converges quickly on their hardware. The decoupled design lets you swap vocoders without retraining the spectrogram network, provided you keep the mel representation identical.

Key highlights

Pre-trained Tacotron 2 and WaveGlow models available for immediate inference
Distributed training and automatic mixed precision through NVIDIA Apex
Warm-start from published checkpoints with dataset-dependent text embeddings ignored by default
Jupyter notebook included for demo inference
LJSpeech dataset support out of the box

Caveats

NVIDIA GPU and CUDA/cuDNN are mandatory; there is no CPU fallback
Mel-spectrogram representation must match exactly between Tacotron 2 and your mel decoder or synthesis will fail
Dataset-dependent text embedding layers are ignored by default during warm-start, which may surprise you if you are adapting to new vocabulary

Verdict

Grab it if you need a battle-tested TTS front end on NVIDIA hardware and do not mind wiring up your own vocoder. Look elsewhere if you want a turnkey, end-to-end speech synthesizer—this is only half the pipeline.

Frequently asked

What is NVIDIA/tacotron2?: NVIDIA's PyTorch port of Tacotron 2 swaps WaveNet for faster vocoders and adds distributed, mixed-precision training to make speech synthesis practical on their GPUs.
Is tacotron2 open source?: Yes — NVIDIA/tacotron2 is open source, released under the BSD-3-Clause license.
What language is tacotron2 written in?: NVIDIA/tacotron2 is primarily written in Jupyter Notebook.
How popular is tacotron2?: NVIDIA/tacotron2 has 5.3k stars on GitHub.
Where can I find tacotron2?: NVIDIA/tacotron2 is on GitHub at https://github.com/NVIDIA/tacotron2.