← all repositories
hirofumi0810/neural_sp

A research-grade ASR kitchen sink that still builds with Kaldi

For when you need to compare CTC, RNN-T, and six attention variants without rewriting training code.

594 stars Python Domain AppsLanguage Models
neural_sp
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does NeuralSP is a PyTorch toolkit for end-to-end speech recognition and language modeling. It bundles encoders (RNN, Transformer, Conformer, TDS convolution), decoders (CTC, RNN-Transducer, attention-based), and a menagerie of streaming variants under one training regime. You also get language models—RNNLM, Transformer-XL, gated CNN—and enough multi-task learning modes to make your head spin.

The interesting bit The streaming support is unusually thorough. Hard monotonic attention, MoChA, monotonic multihead attention, delay-constrained training, minimum latency training, CTC-synchronous training—most toolkits pick one or two. This one tracks the last several years of streaming ASR research like a bibliography come to life.

Key highlights

  • Benchmarked results on 10+ corpora (AISHELL, Librispeech, Switchboard, CSJ, WSJ, etc.) with consistent model naming
  • Front-end includes SpecAugment and adaptive variants; encoders cover Conformer and TDS convolution
  • Decoder fusion options run deep: shallow, cold, deep, plus internal LM estimation and forward-backward attention
  • Output units span phoneme to word-char mix; multi-task learning mixes CTC, attention, and LM objectives hierarchically
  • Still depends on Kaldi for tooling build and pulls in warp-ctc / warp-transducer for efficient loss computation

Caveats

  • Build process requires Kaldi path and manual tool compilation; not a pip install experience
  • README lists many features but offers minimal usage guidance beyond installation
  • Travis CI badge suggests testing, but coverage and current maintenance status are unclear

Verdict Grab this if you’re reproducing streaming ASR papers or need a fair comparison across CTC/RNN-T/attention baselines. Skip it if you want a batteries-included, actively maintained framework with modern packaging—ESPnet has likely superseded much of this.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.