Is MisoTTS open source?

Yes — MisoLabsAI/MisoTTS is an open-source project tracked on heatdrop.

What language is MisoTTS written in?

MisoLabsAI/MisoTTS is primarily written in Python.

How popular is MisoTTS?

MisoLabsAI/MisoTTS has 3.1k stars on GitHub and is currently cooling off.

Where can I find MisoTTS?

MisoLabsAI/MisoTTS is on GitHub at https://github.com/MisoLabsAI/MisoTTS.

← all repositories

MisoLabsAI/MisoTTS

An 8B-parameter voice in your GPU

MisoTTS brings Sesame-style conversational speech synthesis to local hardware, with a Llama backbone and a stubbornly English-only vocabulary.

★3.1k stars Python Image · Video · Audio

View on GitHub ↗

Velocity · 7d

+0.0

★ / day

Trend

↘cooling

star history

What it does MisoTTS is an 8-billion-parameter text-to-speech model that generates conversational audio from text, optionally cloning a voice from a short audio prompt. It runs locally via a single Python script that downloads weights from Hugging Face on first run. The output is watermarked by default using Sony’s SilentCipher.

The interesting bit The architecture borrows from Sesame’s CSM: a Llama-8B backbone handles interleaved text and audio tokens, while a separate 300M decoder predicts the 32 codebooks of each audio frame. This two-stage setup lets the big model focus on “what to say and how to emote” and the small model handle the acoustic details.

Key highlights

8B Llama backbone + 300M audio decoder, 32 Mimi codebooks, 2,048 max sequence length
Voice cloning from prompted audio with transcript alignment
Default inference in bfloat16; CUDA strongly recommended (VRAM requirements depend on precision)
Watermarking enabled by default via SilentCipher
English only — no multilingual support yet

Caveats

The README warns about watermarking timeouts on first download and suggests rerunning the command
“Sufficient VRAM” is vague; no specific GPU requirements or benchmarks are listed
English-only support is explicit, so don’t expect Mandarin or Spanish out of the box

Verdict Worth a look if you’re building voice agents or need local, emotive TTS without API calls. Skip it if you need multilingual support, are running on CPU-only hardware, or want detailed performance numbers before committing GPU time.

Frequently asked

What is MisoLabsAI/MisoTTS?: MisoTTS brings Sesame-style conversational speech synthesis to local hardware, with a Llama backbone and a stubbornly English-only vocabulary.
Is MisoTTS open source?: Yes — MisoLabsAI/MisoTTS is an open-source project tracked on heatdrop.
What language is MisoTTS written in?: MisoLabsAI/MisoTTS is primarily written in Python.
How popular is MisoTTS?: MisoLabsAI/MisoTTS has 3.1k stars on GitHub and is currently cooling off.
Where can I find MisoTTS?: MisoLabsAI/MisoTTS is on GitHub at https://github.com/MisoLabsAI/MisoTTS.