Zyphra/Zonos
An open-weight text-to-speech model trained on 200k+ hours of multilingual speech with speaker cloning and emotion control capabilities.

Velocity · 7d
+15
★ / day
Trend
→steady
star history
Zonos-v0.1 is an open-weight TTS model that generates natural speech from text prompts using speaker embeddings or reference audio clips. It leverages a transformer/hybrid backbone with eSpeak-based text normalization and DAC token prediction. The model supports speech cloning from short reference clips and fine-grained control over speaking rate, pitch, audio quality, and emotional expression, outputting natively at 44kHz.