ekwek1/soprano
A lightweight text-to-speech model capable of 32kHz audio generation with under 250ms latency on CPU and 2000x real-time generation on GPU.

Soprano is an on-device text-to-speech model designed for expressive, high-fidelity speech synthesis. It uses a compact 80M parameter architecture achieving up to 20x real-time generation on CPU and 2000x on GPU with under 250ms CPU latency. The model supports lossless streaming, infinite generation length via automatic text splitting, and various deployment options including OpenAI-compatible endpoints, ONNX, WebUI, CLI, and ComfyUI integration.