lucidrains/e2-tts-pytorch
A PyTorch implementation of E2-TTS, a fully non-autoregressive zero-shot text-to-speech model for synthesizing speech from text.

Velocity · 7d
+0.7
★ / day
Trend
→steady
star history
Implements E2-TTS using a multistream transformer architecture for joint text and audio conditioning. The model generates speech non-autoregressively, avoiding the sequential generation bottleneck of traditional TTS systems. It supports zero-shot voice cloning without requiring explicit alignment engineering. The repository includes a duration predictor module and working end-to-end training code.