m-bain/whisperX
WhisperX is an automatic speech recognition tool that adds word-level timestamps and speaker diarization to OpenAI's Whisper model.

WhisperX extends OpenAI’s Whisper model with improved timestamp accuracy through forced phoneme alignment and voice-activity-based batching for faster inference. It provides word-level timestamps for transcribed speech and integrates speaker diarization to identify different speakers in audio recordings. The system is designed for processing audio files to generate precise transcriptions with temporal alignment and speaker attribution.