Yes — nari-labs/dia is open source, released under the Apache-2.0 license.

What language is dia written in?

nari-labs/dia is primarily written in Python.

nari-labs/dia has 19.3k stars on GitHub.

Where can I find dia?

nari-labs/dia is on GitHub at https://github.com/nari-labs/dia.

← all repositories

nari-labs/dia

A 1.6B-parameter voice actor that laughs, coughs, and clears its throat

Dia generates multi-speaker dialogue with nonverbal sounds in a single pass, no post-processing required.

★19.3k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does Dia is a 1.6B-parameter TTS model that turns transcripts into spoken dialogue between two speakers, marked with [S1] and [S2] tags. It also handles nonverbal cues—(laughs), (sighs), (clears throat), and about fifteen others—baked directly into the generated audio. You can condition it on a short audio clip for voice cloning, or let it invent new voices each run.

The interesting bit Most TTS systems generate clean speech and call it a day. Dia treats conversation as the native format, not a concatenation of monologues. The model was inspired by SoundStorm and Parakeet, and it runs at 2.1x real-time on an RTX 4090 in bfloat16—fast enough to iterate on scripts without brewing coffee between generations.

Key highlights

Single-pass dialogue generation with speaker tags and nonverbal sounds
Voice cloning via 5–10 second audio prompts (with transcript prepended)
Hugging Face Transformers integration; also runs standalone with pip or uv
~4.4 GB VRAM in mixed precision, ~7.9 GB in float32
Apache 2.0 license; weights hosted on Hugging Face

Caveats

English only; CPU support and quantization are on the TODO list
Short inputs (<5s) sound unnatural; long inputs (>20s) speed up unnaturally
RTX 5000-series GPUs need torch 2.8 nightly (see issue #26)
No fixed default voice—speaker consistency requires seed locking or audio prompting

Verdict Worth a spin if you’re prototyping podcasts, games, or any project where two people need to sound like they’re actually talking. Skip it for now if you need non-English, CPU-only deployment, or production-grade voice consistency without prompt engineering.

Frequently asked

What is nari-labs/dia?: Dia generates multi-speaker dialogue with nonverbal sounds in a single pass, no post-processing required.
Is dia open source?: Yes — nari-labs/dia is open source, released under the Apache-2.0 license.
What language is dia written in?: nari-labs/dia is primarily written in Python.
How popular is dia?: nari-labs/dia has 19.3k stars on GitHub.
Where can I find dia?: nari-labs/dia is on GitHub at https://github.com/nari-labs/dia.