← all repositories
supertone-inc/supertonic

A 99M-parameter TTS that runs on your e-reader

Supertonic squeezes multilingual text-to-speech into edge devices by shipping everything as optimized ONNX models with a small memory footprint.

11.4k stars Swift Image · Video · Audio
supertonic
Velocity · 7d
+56
★ / day
Trend
steady
star history

What it does Supertonic is an on-device text-to-speech system that synthesizes 44.1kHz WAV audio locally using ONNX Runtime. It supports 31 languages, requires no GPU, and targets everything from desktops to Raspberry Pis and e-readers. A Python SDK (pip install supertonic) auto-downloads model assets on first run, and a supertonic serve command exposes both native and OpenAI-compatible HTTP endpoints for local integration.

The interesting bit The project bets on ONNX as the universal delivery mechanism: one 99M-parameter checkpoint, runtime examples in a dozen languages (Python, Rust, Swift, Go, Java, C++, C#, Node.js, Browser/WebGPU, Flutter, iOS), and a lang="na" mode that skips language detection entirely. That is unusual in a field where most open TTS models are 0.7B–2B parameters and cloud-dependent.

Key highlights

  • 31 languages with a single model, no separate language adapters
  • 10 inline expression tags (<laugh>, <breath>, <sigh>) for prosodic control without reference audio
  • Voice Builder for creating permanent custom voice profiles from your own audio
  • Competitive WER/CER against much larger models on the Minimax-MLS-test benchmark
  • Batch inference support and quality/speed tradeoffs via total_steps (5–12)

Caveats

  • Model assets live on Hugging Face and require Git LFS; first setup involves cloning ~large files
  • Per-language accuracy varies; the README shows some languages where Supertonic 3 lags behind larger competitors (e.g., Finnish CER at 5.40 vs OmniVoice’s 3.94)
  • The “lightning fast” claim is stated but no concrete RTF or latency numbers are provided in the README

Verdict Worth a look if you need offline TTS in a resource-constrained or privacy-sensitive environment. Skip it if you need the absolute best quality for a single language and don’t mind cloud APIs or larger models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.