Is deepspeech.pytorch open source?

Yes — SeanNaren/deepspeech.pytorch is open source, released under the MIT license.

What language is deepspeech.pytorch written in?

SeanNaren/deepspeech.pytorch is primarily written in Python.

How popular is deepspeech.pytorch?

SeanNaren/deepspeech.pytorch has 2.1k stars on GitHub.

Where can I find deepspeech.pytorch?

SeanNaren/deepspeech.pytorch is on GitHub at https://github.com/SeanNaren/deepspeech.pytorch.

← all repositories

SeanNaren/deepspeech.pytorch

DeepSpeech2 training that scales from one GPU to a cluster

It wraps the classic DeepSpeech2 architecture in PyTorch Lightning to handle everything from single-GPU prototyping to multi-node cluster training.

★2.1k stars Python Image · Video · Audio ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Implements the DeepSpeech2 speech recognition model in PyTorch Lightning, covering training, testing, and inference. It bundles data loaders for common benchmarks like LibriSpeech and Common Voice, and ships with a basic HTTP inference server. A KenLM language model can be layered on at decode time to improve transcription quality.

The interesting bit

Instead of hand-rolled training loops, it uses Hydra for configuration and PyTorch Lightning for scaling. The same script runs on a single GPU or across multiple nodes via TorchElastic, though the latter demands an NFS mount and a dedicated etcd host to avoid becoming a single point of failure.

Key highlights

Supports AN4, TEDLIUM, Voxforge, Common Voice, and LibriSpeech via built-in dataset scripts.
Multi-node training with automatic checkpoint resumption from a shared drive.
Audio augmentations include SpecAugment, noise injection, and tempo/gain perturbation.
Utilities to tune KenLM language-model weights (alpha/beta) through grid search.
Long audio files can be chunked automatically so they fit in GPU memory during transcription.

Caveats

Multi-node training requires etcd and a shared cluster mount; the README explicitly warns against co-locating etcd on a GPU node.
Full functionality depends on several external libraries (ctcdecode, kenlm) that sit outside the repo’s core Python dependencies.
The codebase is strictly a DeepSpeech2 implementation, so those seeking newer architectures will not find them here.

Verdict

Worth a look if you need a reproducible, scalable DeepSpeech2 baseline in modern PyTorch. Look elsewhere if you want a dependency-free drop-in recognizer or newer architectures.

Frequently asked

What is SeanNaren/deepspeech.pytorch?: It wraps the classic DeepSpeech2 architecture in PyTorch Lightning to handle everything from single-GPU prototyping to multi-node cluster training.
Is deepspeech.pytorch open source?: Yes — SeanNaren/deepspeech.pytorch is open source, released under the MIT license.
What language is deepspeech.pytorch written in?: SeanNaren/deepspeech.pytorch is primarily written in Python.
How popular is deepspeech.pytorch?: SeanNaren/deepspeech.pytorch has 2.1k stars on GitHub.
Where can I find deepspeech.pytorch?: SeanNaren/deepspeech.pytorch is on GitHub at https://github.com/SeanNaren/deepspeech.pytorch.