zai-org/GLM-TTS
A text-to-speech synthesis system using large language models that supports zero-shot voice cloning and emotion control via multi-reward reinforcement learning.

Velocity · 7d
+5.5
★ / day
Trend
→steady
star history
GLM-TTS is a high-quality TTS system based on large language models with a two-stage architecture: an LLM generates speech token sequences and a Flow model converts them to audio waveforms. It introduces multi-reward reinforcement learning for improved emotional expression and natural prosody control, supporting zero-shot voice cloning with 3-10 seconds of prompt audio and streaming inference.