← all repositories

Camb-ai/MARS5-TTS

A neural text-to-speech model that generates speech with natural prosody from text and a short reference audio clip.

2.8k stars Jupyter Notebook Image · Video · Audio
MARS5-TTS
Velocity · 7d
+3.8
★ / day
Trend
steady
star history

MARS5 is a speech synthesis model from CAMB.AI that uses a two-stage autoregressive and non-autoregressive pipeline. It takes text input and a reference audio clip as short as 5 seconds to generate speech, including prosodically challenging content like sports commentary and anime. The model architecture includes coarse speech feature encoding and is available on HuggingFace with Colab demo notebooks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.