Is TransformerTTS open source?

Yes — spring-media/TransformerTTS is an open-source project tracked on heatdrop.

What language is TransformerTTS written in?

spring-media/TransformerTTS is primarily written in Python.

How popular is TransformerTTS?

spring-media/TransformerTTS has 1.2k stars on GitHub.

Where can I find TransformerTTS?

spring-media/TransformerTTS is on GitHub at https://github.com/spring-media/TransformerTTS.

← all repositories

spring-media/TransformerTTS

Speech synthesis that skips the queue

A non-autoregressive Transformer TTS implementation that generates spectrograms in one forward pass instead of token-by-token.

★1.2k stars Python Image · Video · Audio

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

TransformerTTS generates mel spectrograms from text using a non-autoregressive Transformer, then hands off to a separate vocoder (MelGAN, HiFiGAN, or Griffin-Lim) to produce actual audio. It is built in TensorFlow 2 and ships with a pre-trained LJSpeech model you can run from a one-liner CLI or a Python script.

The interesting bit

The project ditches autoregressive generation entirely—no token-by-token mel decoding, no attention collapse on long sentences. Instead it uses a dedicated “Aligner” model to extract durations, then predicts everything in parallel. Pitch and speed are exposed as controllable parameters, which is the kind of affordance you usually sacrifice for speed.

Key highlights

One-shot inference: the forward model generates the full spectrogram in a single pass
Pre-trained LJSpeech model with weights at 5K-step intervals from 60K to 100K
Compatible with MelGAN and HiFiGAN vocoders; older WaveRNN support was dropped in late 2020
Duration extraction uses Dijkstra’s algorithm, which is either elegant or overkill depending on your worldview
Includes a Colab notebook for trying synthesis without installing anything

Caveats

The pre-trained API requires checking out a specific commit (493be634...); drift from that and things may break
Training is a two-stage pipeline (Aligner → duration extraction → TTS), so “quick fine-tuning” is not really in the cards
Only LJSpeech pre-trained weights are provided; other voices mean training from scratch

Verdict

Worth a look if you need controllable, parallel TTS and can live within the LJSpeech voice or train your own. If you want plug-and-play multilingual voices or a single-stage training loop, this is not your repo.

Frequently asked

What is spring-media/TransformerTTS?: A non-autoregressive Transformer TTS implementation that generates spectrograms in one forward pass instead of token-by-token.
Is TransformerTTS open source?: Yes — spring-media/TransformerTTS is an open-source project tracked on heatdrop.
What language is TransformerTTS written in?: spring-media/TransformerTTS is primarily written in Python.
How popular is TransformerTTS?: spring-media/TransformerTTS has 1.2k stars on GitHub.
Where can I find TransformerTTS?: spring-media/TransformerTTS is on GitHub at https://github.com/spring-media/TransformerTTS.