Is Speech-Transformer open source?

Yes — kaituoxu/Speech-Transformer is an open-source project tracked on heatdrop.

What language is Speech-Transformer written in?

kaituoxu/Speech-Transformer is primarily written in Python.

How popular is Speech-Transformer?

kaituoxu/Speech-Transformer has 810 stars on GitHub.

Where can I find Speech-Transformer?

kaituoxu/Speech-Transformer is on GitHub at https://github.com/kaituoxu/Speech-Transformer.

← all repositories

kaituoxu/Speech-Transformer

Transformer meets Mandarin speech: 12.8% CER, one neural net

A from-scratch PyTorch port of the Speech Transformer paper, wired for end-to-end Chinese ASR with Kaldi doing the feature grunt work.

★810 stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Takes acoustic features and spits out Mandarin characters through a single Transformer network — no separate acoustic model, no language model bolted on the side. The repo wraps training, decoding, and even Visdom loss plotting into one shell script (run.sh) that stages through data prep, feature extraction, training, and decoding.

The interesting bit

The author didn’t just port the paper; they glued it to the Kaldi ecosystem for feature extraction while keeping the neural bits pure PyTorch. That hybrid approach — Kaldi for MFCCs, Transformer for everything else — was a pragmatic bridge during the 2018-2019 transition when end-to-end ASR was still proving itself against pipeline systems.

Key highlights

Single-network end-to-end: acoustic features → characters, no intermediate phoneme representation
AIShell-1 recipe included: download the dataset, tweak one path, bash run.sh
Training resumption and Visdom visualization baked into the runner
CER of 12.8% on AIShell-1, competitive with LAS (13.2%) though trailing LSTMP (9.85%)
PyTorch 0.4.1+ era code — expect some archaeology if you’re on modern torch

Caveats

Kaldi dependency is mandatory, not optional — feature extraction is outsourced entirely
The 12.8% CER lags behind the LSTMP baseline in the same table, so the “attention is all you need” sales pitch doesn’t quite close the deal on this dataset
PyTorch 0.4.1+ requirement suggests significant bit-rot risk; no commits visible since ~2019

Verdict

Worth a look if you’re studying how Transformer ASR was adapted for Mandarin or need a reference implementation of the Zhao et al. ICASSP 2019 paper. Skip it if you want production-ready tooling — the field has moved to Conformer, wav2vec 2.0, and friends.

Frequently asked

What is kaituoxu/Speech-Transformer?: A from-scratch PyTorch port of the Speech Transformer paper, wired for end-to-end Chinese ASR with Kaldi doing the feature grunt work.
Is Speech-Transformer open source?: Yes — kaituoxu/Speech-Transformer is an open-source project tracked on heatdrop.
What language is Speech-Transformer written in?: kaituoxu/Speech-Transformer is primarily written in Python.
How popular is Speech-Transformer?: kaituoxu/Speech-Transformer has 810 stars on GitHub.
Where can I find Speech-Transformer?: kaituoxu/Speech-Transformer is on GitHub at https://github.com/kaituoxu/Speech-Transformer.