kyutai-labs/unmute
A real-time voice interface that wraps text LLMs with speech recognition and synthesis capabilities.

Unmute is a system that enables text-only LLMs to engage in spoken conversations by integrating speech-to-text and text-to-speech models. The backend receives user audio via WebSocket, transcribes it using Kyutai’s STT model, sends the text to an LLM for response generation, and streams the generated speech back via Kyutai’s TTS model. Both speech models are optimized for low latency to maintain natural conversation flow. The system can use any LLM via OpenRouter or self-hosted VLLM.