← all repositories

lucidrains/naturalspeech2-pytorch

A PyTorch implementation of Natural Speech 2 for zero-shot text-to-speech and singing synthesis using latent diffusion.

1.3k stars Python Image · Video · Audio
naturalspeech2-pytorch
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

The repository provides a PyTorch implementation of Google's Natural Speech 2, enabling zero-shot neural TTS and singing synthesis. It uses a neural audio codec to encode raw audio into continuous latent vectors, combined with a latent diffusion model for non-autoregressive generation. The implementation includes phoneme, pitch, and duration encoders along with speech prompt encoders for zero-shot voice cloning capabilities.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.