A voice chat app that actually lets you interrupt the AI
Full-duplex voice conversation with an LLM, wired together from RealtimeSTT, RealtimeTTS, and a FastAPI backend.

What it does
You talk, it listens. Your voice streams through the browser to a Python backend, gets transcribed by Whisper via RealtimeSTT, fed to an LLM (Ollama or OpenAI), then spoken back via RealtimeTTS (Coqui, Kokoro, or Orpheus). The whole loop runs over WebSockets with a vanilla JS frontend. Docker Compose is the recommended deployment path, especially on Linux with an NVIDIA GPU.
The interesting bit
The project treats turn-taking as a first-class problem. A dedicated turndetect.py module uses dynamic silence detection to figure out when you’re done speaking, and the system handles interruptions gracefully — you can cut the AI off mid-sentence. That’s harder than it sounds when you’re streaming audio chunks in both directions.
Key highlights
- Pluggable LLM backends: Ollama by default, OpenAI via
llm_module.py - Swappable TTS engines: Coqui XTTSv2, Kokoro, or Orpheus
- Real-time feedback: partial transcriptions and AI responses visible as they arrive
- Docker Compose setup bundles the app, dependencies, and Ollama service
- MIT licensed core; external TTS/LLM providers have their own terms
Caveats
- Author has stepped back due to time constraints; community PRs are accepted but no active feature development or user support
- “Powerful CUDA-enabled NVIDIA GPU” is essentially mandatory — CPU-only performance is “significantly slower”
- Windows manual install is fiddly (DeepSpeed compilation issues); Linux + Docker is the happy path
- Python capped below 3.13
Verdict
Good starting point if you want to self-host a voice AI and don’t mind getting your hands dirty with Docker and GPU drivers. Skip it if you need polished end-user support or are running on Apple Silicon or CPU-only hardware.