Whisper, but make it a web app with friends
A full-stack starter kit for real-time speech-to-text that bundles diarization, 99 languages, and enough config files to keep you busy.

What it does
Whisper Playground wraps OpenAI’s Whisper model in a React frontend and Python backend so you can transcribe microphone audio in a browser. It adds speaker diarization via Pyannote and Diart, supports 99 languages, and lets you toggle between real-time streaming and batched sequential transcription.
The interesting bit
The project is essentially plumbing — but thoughtful plumbing. It wires faster-whisper (a speed-optimized reimplementation) to a WebSocket server, then surfaces knobs like beam size and transcription timeout in the UI. The diarization integration is the less common piece; most Whisper demos stop at raw text.
Key highlights
- Supports 99 languages through Whisper’s multilingual training
- Two transcription modes: real-time diarization vs. sequential with more context
- Model sizes from
tinytolarge-v2for trading accuracy against hardware budget - Requires Hugging Face authentication and explicit model license acceptance for Pyannote components
- MIT licensed, including the Whisper weights
Caveats
- Speaker swapping bugs in sequential mode (issue #27)
- Short audio chunks under the timeout threshold get dropped in real-time mode (issue #28)
- Untested across all claimed languages; Rust toolchain may be needed on macOS for dependency builds
Verdict
Good for developers who want a working baseline to hack on — conference transcription tools, accessibility experiments, voice UIs. Skip it if you need production-grade reliability out of the box; the known bugs and manual config steps suggest this is a playground in the literal sense.