Is dc_tts open source?

Yes — Kyubyong/dc_tts is open source, released under the Apache-2.0 license.

What language is dc_tts written in?

Kyubyong/dc_tts is primarily written in Python.

How popular is dc_tts?

Kyubyong/dc_tts has 1.2k stars on GitHub.

Where can I find dc_tts?

Kyubyong/dc_tts is on GitHub at https://github.com/Kyubyong/dc_tts.

← all repositories

Kyubyong/dc_tts

A TTS model that learns from Nick Offerman and Kate Winslet

Kyubyong's DC-TTS implementation tests whether a convolution-based speech synthesizer can train on tiny, quirky datasets—not just standard benchmarks.

★1.2k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does This is a TensorFlow implementation of DC-TTS, a text-to-speech system built entirely on deep convolutional networks with guided attention. It converts text to mel spectrograms (Text2Mel), then to linear spectrograms (SSRN), and finally to audio. The repo includes training pipelines, synthesis scripts, and pretrained models for the LJ Speech dataset.

The interesting bit The author didn’t just replicate the paper—he stress-tested it. Nick Offerman’s 18-hour audiobooks and Kate Winslet’s 5-hour recording join the standard LJ Speech benchmark, plus a Korean dataset. The goal: see if the model learns when data is scarce and voices are, shall we say, characterful. He also had to deviate from the paper—adding layer normalization, decaying the learning rate, and applying dropout where the original authors stayed silent.

Key highlights

Purely convolutional architecture, no recurrent layers—faster than Tacotron per the author
Guided attention mechanism that reportedly locks alignment early (monotonic attention plots “almost from the beginning”)
Trained on four distinct datasets: LJ Speech (24h), Nick Offerman (18h), Kate Winslet (5h), and Korean KSS (12h)
Generated samples at multiple training steps posted to SoundCloud for direct comparison
Pretrained LJ model available via Dropbox; Harvard Sentences included for quick synthesis tests

Caveats

Requires TensorFlow ≥ 1.3, and tf.contrib.layers.layer_norm API has shifted since then—version fragility is visible
The author couldn’t replicate the paper’s “trained within a day” claim; training speed promises from 2017 may not hold
Simultaneous training of Text2Mel and SSRN failed; the two-stage pipeline is mandatory, not optional

Verdict Worth a look if you’re studying TTS architectures or need a convolutional baseline to compare against newer transformers. Skip it if you want production-ready, maintained code—this is a research sandbox from the TensorFlow 1.x era, and the author treats it as such.

Frequently asked

What is Kyubyong/dc_tts?: Kyubyong's DC-TTS implementation tests whether a convolution-based speech synthesizer can train on tiny, quirky datasets—not just standard benchmarks.
Is dc_tts open source?: Yes — Kyubyong/dc_tts is open source, released under the Apache-2.0 license.
What language is dc_tts written in?: Kyubyong/dc_tts is primarily written in Python.
How popular is dc_tts?: Kyubyong/dc_tts has 1.2k stars on GitHub.
Where can I find dc_tts?: Kyubyong/dc_tts is on GitHub at https://github.com/Kyubyong/dc_tts.