← all repositories

ekwek1/soprano

A lightweight text-to-speech model capable of 32kHz audio generation with under 250ms latency on CPU and 2000x real-time generation on GPU.

1.2k stars Python Image · Video · Audio
soprano
Velocity · 7d
+6.5
★ / day
Trend
steady
star history

Soprano is an on-device text-to-speech model designed for expressive, high-fidelity speech synthesis. It uses a compact 80M parameter architecture achieving up to 20x real-time generation on CPU and 2000x on GPU with under 250ms CPU latency. The model supports lossless streaming, infinite generation length via automatic text splitting, and various deployment options including OpenAI-compatible endpoints, ONNX, WebUI, CLI, and ComfyUI integration.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.