shivammehta25/Matcha-TTS
A neural text-to-speech model that generates speech from text using conditional flow matching.

Velocity · 7d
+1.3
★ / day
Trend
→steady
star history
Matcha-TTS is a non-autoregressive neural TTS system that uses conditional flow matching to synthesize speech from text. The model learns a probability path between noise and audio through diffusion-style training, and performs inference by solving an ODE to generate waveforms. The system is designed to be fast, probabilistic, and memory-efficient while producing natural-sounding speech, published at ICASSP 2024.