← all repositories

MahmoudAshraf97/whisper-diarization

Audio pipeline that transcribes speech using OpenAI Whisper and identifies individual speakers via diarization.

5.5k stars Jupyter Notebook Image · Video · Audio
whisper-diarization
Velocity · 7d
+4.5
★ / day
Trend
steady
star history

This repository provides a speaker diarization pipeline combining OpenAI Whisper for automatic speech recognition with NVIDIA NeMo for speaker identification. The system processes audio to produce timestamped transcripts that attribute speech segments to specific speakers. It offers Google Colab notebooks for easy execution and integrates multiple ML models to handle the full speech-to-text-and-speaker workflow.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.