Is whisper-diarization open source?

Yes — MahmoudAshraf97/whisper-diarization is open source, released under the BSD-2-Clause license.

What language is whisper-diarization written in?

MahmoudAshraf97/whisper-diarization is primarily written in Jupyter Notebook.

How popular is whisper-diarization?

MahmoudAshraf97/whisper-diarization has 5.6k stars on GitHub.

Where can I find whisper-diarization?

MahmoudAshraf97/whisper-diarization is on GitHub at https://github.com/MahmoudAshraf97/whisper-diarization.

← all repositories

MahmoudAshraf97/whisper-diarization

A transcription pipeline that remembers who was talking

It stitches together Whisper, NeMo, and Demucs so you can read a transcript and know exactly which speaker said every sentence.

★5.6k stars Jupyter Notebook Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does Takes an audio file and produces a transcript where each sentence is tagged with a speaker label. The pipeline extracts vocals, transcribes speech with Whisper, then runs the audio through Nvidia NeMo’s MarbleNet and TitaNet models for voice-activity detection and speaker embedding. Timestamps are massaged by a CTC forced aligner and punctuation model to keep words and speakers in sync.

The interesting bit Instead of building a single monolithic model, the project acts as a careful choreographer between several specialized tools—using Demucs to strip background noise before speaker identification, then cross-referencing Whisper’s output with NeMo’s segments via timestamp alignment. It’s essentially very smart glue code that tries to compensate for each component’s blind spots.

Key highlights

Combines OpenAI Whisper, Faster Whisper, Nvidia NeMo, and Demucs under one roof
Uses ctc-forced-aligner and punctuation models to correct timestamp drift between ASR and diarization
Offers a parallel inference path (diarize_parallel.py) for systems with 10 GB+ VRAM, though the README warns it is still experimental
Ships as both a Python script and a Jupyter notebook runnable on Colab
Defaults to English (medium.en) but supports manual language selection

Caveats

Overlapping speakers are explicitly not handled; the README notes this would require isolating individual speakers and significantly more computation
The parallel pipeline is flagged as experimental and potentially error-prone
Some parameters remain hardcoded in diarize.py and helpers.py with limited CLI exposure

Verdict Worth a look if you need speaker-attributed transcripts from meetings or interviews and don’t mind wrangling a multi-model pipeline. Skip it if you need real-time processing or robust handling of crosstalk and overlapping dialogue.

Frequently asked

What is MahmoudAshraf97/whisper-diarization?: It stitches together Whisper, NeMo, and Demucs so you can read a transcript and know exactly which speaker said every sentence.
Is whisper-diarization open source?: Yes — MahmoudAshraf97/whisper-diarization is open source, released under the BSD-2-Clause license.
What language is whisper-diarization written in?: MahmoudAshraf97/whisper-diarization is primarily written in Jupyter Notebook.
How popular is whisper-diarization?: MahmoudAshraf97/whisper-diarization has 5.6k stars on GitHub.
Where can I find whisper-diarization?: MahmoudAshraf97/whisper-diarization is on GitHub at https://github.com/MahmoudAshraf97/whisper-diarization.