DeepSpeech2 on PaddlePaddle: Chinese ASR with Jetson support
A maintained Chinese speech recognition stack that runs from desktop to edge, though the author now prefers their newer dynamic-graph rewrite.

What it does Implements Baidu’s DeepSpeech2 paper in PaddlePaddle for end-to-end Chinese (and English) speech recognition. Ships with training and inference pipelines, beam search decoding, data augmentation recipes, and deployment paths including a web server, GUI, and Nvidia Jetson boards.
The interesting bit The project straddles two eras: it’s a static-graph PaddlePaddle implementation that the author recently refactored to drop legacy “fluid” APIs, yet they openly steer new users toward PPASR—their dynamic-graph successor with Conformer and Squeezeformer support. This one persists for those who need the classic DeepSpeech2 architecture or the Jetson deployment path.
Key highlights
- Pre-trained models available for AIShell (179h Mandarin), Librispeech (960h English), and WenetSpeech (10,000h Mandarin)
- CER of 5.94% on AIShell with beam search, 8.35% with greedy decoding
- Long-audio recognition via webrtcvad voice activity detection
- Model export for inference; supports CPU, GPU, and TensorRT
- Chinese numeral conversion to Arabic digits post-recognition
Caveats
- The WenetSpeech pre-trained model shows blank error rates in the README table—unclear if training is incomplete or metrics weren’t updated
- Last meaningful feature update appears to be 2021; the 2024 “refactor” mainly removed deprecated APIs rather than adding capability
- The author explicitly recommends PPASR for new projects using simpler dynamic graphs
Verdict Worth a look if you specifically need DeepSpeech2 on PaddlePaddle static graphs or are targeting Jetson deployment with an existing pipeline. Otherwise, follow the author’s own advice and check PPASR first.