← all repositories

yl4579/StyleTTS2

A text-to-speech model achieving human-level synthesis using style diffusion and adversarial training with large speech language models.

StyleTTS2
Velocity · 7d
+5.8
★ / day
Trend
steady
star history

StyleTTS 2 is a deep learning TTS model that generates speech by modeling styles as latent variables through diffusion models and using large pre-trained speech language models (such as WavLM) as discriminators. It employs adversarial training with differentiable duration modeling for end-to-end training, enabling efficient synthesis without requiring reference speech. The model achieves human-level quality on single-speaker LJSpeech and multi-speaker VCTK datasets, and supports zero-shot speaker adaptation when trained on LibriTTS.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.