lucidrains/naturalspeech2-pytorch
A PyTorch implementation of Natural Speech 2 for zero-shot text-to-speech and singing synthesis using latent diffusion.

Velocity · 7d
+1.2
★ / day
Trend
→steady
star history
The repository provides a PyTorch implementation of Google's Natural Speech 2, enabling zero-shot neural TTS and singing synthesis. It uses a neural audio codec to encode raw audio into continuous latent vectors, combined with a latent diffusion model for non-autoregressive generation. The implementation includes phoneme, pitch, and duration encoders along with speech prompt encoders for zero-shot voice cloning capabilities.