← all repositories

m-bain/whisperX

WhisperX is an automatic speech recognition tool that adds word-level timestamps and speaker diarization to OpenAI's Whisper model.

22.3k stars Python Image · Video · Audio
whisperX
Velocity · 7d
+17
★ / day
Trend
steady
star history

WhisperX extends OpenAI’s Whisper model with improved timestamp accuracy through forced phoneme alignment and voice-activity-based batching for faster inference. It provides word-level timestamps for transcribed speech and integrates speaker diarization to identify different speakers in audio recordings. The system is designed for processing audio files to generate precise transcriptions with temporal alignment and speaker attribution.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.