← all repositories
saharmor/whisper-playground

Whisper, but make it a web app with friends

A full-stack starter kit for real-time speech-to-text that bundles diarization, 99 languages, and enough config files to keep you busy.

whisper-playground
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does

Whisper Playground wraps OpenAI’s Whisper model in a React frontend and Python backend so you can transcribe microphone audio in a browser. It adds speaker diarization via Pyannote and Diart, supports 99 languages, and lets you toggle between real-time streaming and batched sequential transcription.

The interesting bit

The project is essentially plumbing — but thoughtful plumbing. It wires faster-whisper (a speed-optimized reimplementation) to a WebSocket server, then surfaces knobs like beam size and transcription timeout in the UI. The diarization integration is the less common piece; most Whisper demos stop at raw text.

Key highlights

  • Supports 99 languages through Whisper’s multilingual training
  • Two transcription modes: real-time diarization vs. sequential with more context
  • Model sizes from tiny to large-v2 for trading accuracy against hardware budget
  • Requires Hugging Face authentication and explicit model license acceptance for Pyannote components
  • MIT licensed, including the Whisper weights

Caveats

  • Speaker swapping bugs in sequential mode (issue #27)
  • Short audio chunks under the timeout threshold get dropped in real-time mode (issue #28)
  • Untested across all claimed languages; Rust toolchain may be needed on macOS for dependency builds

Verdict

Good for developers who want a working baseline to hack on — conference transcription tools, accessibility experiments, voice UIs. Skip it if you need production-grade reliability out of the box; the known bugs and manual config steps suggest this is a playground in the literal sense.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.