Is MARS5-TTS open source?

Yes — Camb-ai/MARS5-TTS is open source, released under the AGPL-3.0 license.

What language is MARS5-TTS written in?

Camb-ai/MARS5-TTS is primarily written in Jupyter Notebook.

How popular is MARS5-TTS?

Camb-ai/MARS5-TTS has 2.8k stars on GitHub.

Where can I find MARS5-TTS?

Camb-ai/MARS5-TTS is on GitHub at https://github.com/Camb-ai/MARS5-TTS.

← all repositories

Camb-ai/MARS5-TTS

Voice cloning that obeys your commas and capital letters

MARS5 is an English text-to-speech model that clones a voice from a few seconds of audio and treats punctuation and capitalization as literal prosody controls for expressive speech.

★2.8k stars Jupyter Notebook Image · Video · Audio

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

MARS5 is a two-stage English text-to-speech model from CAMB.AI. Feed it a text prompt plus a short reference audio clip—as brief as one to twelve seconds, with roughly six seconds giving the best results—and it generates new speech in that speaker’s voice. The system offers a shallow clone for speed or a deep clone, which requires the reference transcript, for higher fidelity at the cost of longer inference.

The interesting bit

The model is trained on raw audio paired with byte-pair-encoded text, so it actually respects punctuation and capitalization as prosodic instructions: add a comma for a pause, capitalize a word for emphasis. Most TTS systems treat punctuation as invisible; MARS5 treats it like stage directions. Under the hood, an autoregressive transformer produces coarse EnCodec features and a multinomial DDPM refines the remaining codebook values before vocoding.

Key highlights

Steer prosody with literal punctuation and capitalization: a comma inserts a pause, all-caps adds emphasis.
Two inference modes: shallow clone for fast results without a transcript, deep clone for better quality when you provide the reference transcript.
Roughly 1.2 billion total parameters split across a 750M-parameter autoregressive model and a 450M-parameter non-autoregressive DDPM.
Released under GNU AGPL 3.0 for the English open-source version; CAMB.AI runs a commercial API with 140+ languages.
Distributed via torch.hub and HuggingFace safetensors, with Docker images available.

Caveats

The authors warn that inference stability and consistency are still rough, and long-form generation is not yet implemented.
No benchmark numbers on standard datasets are provided yet; the team lists both benchmarks and runtime profiling as pending contributions.
You need a GPU that can hold both checkpoints simultaneously and run inference with 750M active parameters; Apple Silicon MPS users will hit CPU fallbacks for unsupported operators, and the open-source release is English-only.

Verdict

Worth a look if you need expressive, steerable English TTS and have the GPU memory to spare. Skip it if you need production-grade stability, long-form narration out of the box, or a lightweight edge deployment.

Frequently asked

What is Camb-ai/MARS5-TTS?: MARS5 is an English text-to-speech model that clones a voice from a few seconds of audio and treats punctuation and capitalization as literal prosody controls for expressive speech.
Is MARS5-TTS open source?: Yes — Camb-ai/MARS5-TTS is open source, released under the AGPL-3.0 license.
What language is MARS5-TTS written in?: Camb-ai/MARS5-TTS is primarily written in Jupyter Notebook.
How popular is MARS5-TTS?: Camb-ai/MARS5-TTS has 2.8k stars on GitHub.
Where can I find MARS5-TTS?: Camb-ai/MARS5-TTS is on GitHub at https://github.com/Camb-ai/MARS5-TTS.