← all repositories

Zyphra/Zonos

An open-weight text-to-speech model trained on 200k+ hours of multilingual speech with speaker cloning and emotion control capabilities.

7.2k stars Python Image · Video · Audio
Zonos
Velocity · 7d
+15
★ / day
Trend
steady
star history

Zonos-v0.1 is an open-weight TTS model that generates natural speech from text prompts using speaker embeddings or reference audio clips. It leverages a transformer/hybrid backbone with eSpeak-based text normalization and DAC token prediction. The model supports speech cloning from short reference clips and fine-grained control over speaking rate, pitch, audio quality, and emotional expression, outputting natively at 44kHz.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.