Is naturalspeech2-pytorch open source?

Yes — lucidrains/naturalspeech2-pytorch is open source, released under the MIT license.

What language is naturalspeech2-pytorch written in?

lucidrains/naturalspeech2-pytorch is primarily written in Python.

How popular is naturalspeech2-pytorch?

lucidrains/naturalspeech2-pytorch has 1.3k stars on GitHub.

Where can I find naturalspeech2-pytorch?

lucidrains/naturalspeech2-pytorch is on GitHub at https://github.com/lucidrains/naturalspeech2-pytorch.

← all repositories

lucidrains/naturalspeech2-pytorch

Zero-shot voice cloning via latent diffusion, now in PyTorch

This repo re-implements the NaturalSpeech 2 paper in PyTorch, trading its score-based SDE for a denoising diffusion model that synthesizes speech and singing from text, prompts, and pitch.

★1.3k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does Implements NaturalSpeech 2, a zero-shot text-to-speech and singing synthesis system. It pairs a neural audio codec that compresses sound into continuous latent vectors with a non-autoregressive latent diffusion model. The codebase also bundles phoneme, pitch, duration, and speech-prompt encoders so generation can be conditioned on text, prosody, and short audio clips.

The interesting bit The author deliberately deviates from the original paper by replacing its score-based SDE with a denoising diffusion model, and has already added classifier-free guidance even though the paper omitted it. The roadmap also cites FlashAttention and GLU variants, suggesting this is shaping up to be a modernized port rather than a literal translation.

Key highlights

Zero-shot synthesis of voice and singing from text and audio prompts
Continuous latent audio codec representations instead of discrete tokens
Built-in conditioning on phonemes, pitch, duration, and speech prompts
Includes a Trainer class for training loops
Explicitly work-in-progress with architectural todos still open

Caveats

Marked “wip”; several core features remain unchecked, including self-conditioning and automatic audio slicing for prompts
The author openly notes uncertainty around pyworld pitch extraction and encodec audio curtailment, with a todo to “consult phd student in TTS field”
No pretrained weights, evaluation metrics, or audio samples appear in the README

Verdict Worth watching if you research neural TTS or want a hackable diffusion-based audio generator. If you need a production-ready voice clone today, the unchecked todos and missing polish mean this is still a construction site.

Frequently asked

What is lucidrains/naturalspeech2-pytorch?: This repo re-implements the NaturalSpeech 2 paper in PyTorch, trading its score-based SDE for a denoising diffusion model that synthesizes speech and singing from text, prompts, and pitch.
Is naturalspeech2-pytorch open source?: Yes — lucidrains/naturalspeech2-pytorch is open source, released under the MIT license.
What language is naturalspeech2-pytorch written in?: lucidrains/naturalspeech2-pytorch is primarily written in Python.
How popular is naturalspeech2-pytorch?: lucidrains/naturalspeech2-pytorch has 1.3k stars on GitHub.
Where can I find naturalspeech2-pytorch?: lucidrains/naturalspeech2-pytorch is on GitHub at https://github.com/lucidrains/naturalspeech2-pytorch.