Is deep-speaker open source?

Yes — philipperemy/deep-speaker is open source, released under the MIT license.

What language is deep-speaker written in?

philipperemy/deep-speaker is primarily written in Python.

How popular is deep-speaker?

philipperemy/deep-speaker has 941 stars on GitHub.

Where can I find deep-speaker?

philipperemy/deep-speaker is on GitHub at https://github.com/philipperemy/deep-speaker.

← all repositories

philipperemy/deep-speaker

Voice fingerprints on a budget: reproducing Baidu's speaker ID

An unofficial but thorough Keras/TensorFlow port of Baidu's Deep Speaker, complete with pretrained models and a six-day training recipe on consumer GPUs.

★941 stars Python Image · Video · Audio ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Maps audio clips to 512-dimensional “voice fingerprints” using a ResCNN trained first with softmax, then refined with triplet loss. You feed it MFCCs from a WAV or FLAC; it returns an embedding where cosine similarity tells you whether two utterances come from the same speaker. The repo includes pretrained checkpoints, inference code, and the full training pipeline.

The interesting bit

The two-stage training mirrors the original paper’s philosophy but is honest about the hardware reality: ~6 days on a GTX 1070/1080 Ti, 300 GB SSD scratch space, and 32 GB RAM plus swap. The author also ships a Chinese cloud mirror for the pretrained model—practical, not performative.

Key highlights

Pretrained ResCNN Softmax+Triplet model: 99.7% accuracy, 2.5% EER on LibriSpeech “all” (2,484 speakers)
TensorFlow 2.3–2.6 compatible; inference works on newer versions, evaluation scripts pinned to 2.3
CLI handles the full drudgery: download LibriSpeech, build MFCCs, train softmax (~3 days), train triplets (~3 days)
Supports custom datasets if you match LibriSpeech’s directory layout and use FLAC (or ffmpeg from WAV)
Triplet loss with hard negative mining; author notes the training loss plateaus because hard examples stay hard

Caveats

test-model evaluation breaks on TensorFlow >2.3; the README explicitly warns about this
Performance drops on noisy data; the author recommends preprocessing with Sox to strip silence and background noise
Training demands are substantial—this is not a “pip install and go” solution for casual experimentation

Verdict

Worth a look if you need speaker verification/identification and want a reproducible, documented baseline without enterprise tooling. Skip it if you need real-time streaming inference or a plug-and-play API; this is research code with training wheels, not a product.

Frequently asked

What is philipperemy/deep-speaker?: An unofficial but thorough Keras/TensorFlow port of Baidu's Deep Speaker, complete with pretrained models and a six-day training recipe on consumer GPUs.
Is deep-speaker open source?: Yes — philipperemy/deep-speaker is open source, released under the MIT license.
What language is deep-speaker written in?: philipperemy/deep-speaker is primarily written in Python.
How popular is deep-speaker?: philipperemy/deep-speaker has 941 stars on GitHub.
Where can I find deep-speaker?: philipperemy/deep-speaker is on GitHub at https://github.com/philipperemy/deep-speaker.