Is VoiceStreamAI open source?

Yes — alesaccoia/VoiceStreamAI is open source, released under the MIT license.

What language is VoiceStreamAI written in?

alesaccoia/VoiceStreamAI is primarily written in Python.

How popular is VoiceStreamAI?

alesaccoia/VoiceStreamAI has 959 stars on GitHub.

Where can I find VoiceStreamAI?

alesaccoia/VoiceStreamAI is on GitHub at https://github.com/alesaccoia/VoiceStreamAI.

← all repositories

alesaccoia/VoiceStreamAI

Self-hosted Whisper, streamed live over WebSocket

It wires a browser microphone to a self-hosted Whisper model over WebSocket, chunking audio so the GPU can keep up without hitting OpenAI’s API.

★959 stars Python Image · Video · Audio Inference · Serving

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

VoiceStreamAI is a Python server with a JavaScript client that captures browser audio and feeds it to a self-hosted Whisper model for near-real-time transcription. It uses Hugging Face’s pyannote VAD to ignore silence, then buffers speech into chunks to avoid drowning the ASR pipeline. The whole thing speaks WebSocket, so the browser streams while the server churns.

The interesting bit

The server uses a SilenceAtEndOfChunk strategy: it waits for a pause before finalizing a buffer so words don’t get sliced at chunk boundaries. The README candidly admits the strategy pattern is currently just an if/else block in server.py, which gives the project an endearing work-in-progress honesty.

Key highlights

Self-hosted stack: defaults to faster-whisper and Hugging Face’s pyannote VAD, with no cloud ASR required.
GPU-bound latency: the README notes a 7-second transcription time on a Tesla T4, so chunking is essential to keep up with live audio.
Per-client tuning: each browser session can send its own language, chunk length, and silence offset over the WebSocket wire.
Modular backends: the server uses factory arguments to swap VAD or ASR components without rewriting the core.
Optional TLS: supports secure WebSockets if you provide a certificate and key.

Caveats

Small audio chunks can cause Whisper to lose context and misinterpret speech.
The pipeline currently writes audio chunks to disk before inference rather than processing them in memory.
The default pyannote VAD requires a Hugging Face authentication token.

Verdict

Good for developers who want a private, self-hosted alternative to cloud transcription APIs and can tolerate near-real-time latency. Skip it if you need instantaneous word-by-word output or a fully memory-resident pipeline.

Frequently asked

What is alesaccoia/VoiceStreamAI?: It wires a browser microphone to a self-hosted Whisper model over WebSocket, chunking audio so the GPU can keep up without hitting OpenAI’s API.
Is VoiceStreamAI open source?: Yes — alesaccoia/VoiceStreamAI is open source, released under the MIT license.
What language is VoiceStreamAI written in?: alesaccoia/VoiceStreamAI is primarily written in Python.
How popular is VoiceStreamAI?: alesaccoia/VoiceStreamAI has 959 stars on GitHub.
Where can I find VoiceStreamAI?: alesaccoia/VoiceStreamAI is on GitHub at https://github.com/alesaccoia/VoiceStreamAI.