MahmoudAshraf97/whisper-diarization
Audio pipeline that transcribes speech using OpenAI Whisper and identifies individual speakers via diarization.

Velocity · 7d
+4.5
★ / day
Trend
→steady
star history
This repository provides a speaker diarization pipeline combining OpenAI Whisper for automatic speech recognition with NVIDIA NeMo for speaker identification. The system processes audio to produce timestamped transcripts that attribute speech segments to specific speakers. It offers Google Colab notebooks for easy execution and integrates multiple ML models to handle the full speech-to-text-and-speaker workflow.