← all repositories
yeyupiaoling/PaddlePaddle-DeepSpeech

DeepSpeech2 on PaddlePaddle: Chinese ASR with Jetson support

A maintained Chinese speech recognition stack that runs from desktop to edge, though the author now prefers their newer dynamic-graph rewrite.

PaddlePaddle-DeepSpeech
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does Implements Baidu’s DeepSpeech2 paper in PaddlePaddle for end-to-end Chinese (and English) speech recognition. Ships with training and inference pipelines, beam search decoding, data augmentation recipes, and deployment paths including a web server, GUI, and Nvidia Jetson boards.

The interesting bit The project straddles two eras: it’s a static-graph PaddlePaddle implementation that the author recently refactored to drop legacy “fluid” APIs, yet they openly steer new users toward PPASR—their dynamic-graph successor with Conformer and Squeezeformer support. This one persists for those who need the classic DeepSpeech2 architecture or the Jetson deployment path.

Key highlights

  • Pre-trained models available for AIShell (179h Mandarin), Librispeech (960h English), and WenetSpeech (10,000h Mandarin)
  • CER of 5.94% on AIShell with beam search, 8.35% with greedy decoding
  • Long-audio recognition via webrtcvad voice activity detection
  • Model export for inference; supports CPU, GPU, and TensorRT
  • Chinese numeral conversion to Arabic digits post-recognition

Caveats

  • The WenetSpeech pre-trained model shows blank error rates in the README table—unclear if training is incomplete or metrics weren’t updated
  • Last meaningful feature update appears to be 2021; the 2024 “refactor” mainly removed deprecated APIs rather than adding capability
  • The author explicitly recommends PPASR for new projects using simpler dynamic graphs

Verdict Worth a look if you specifically need DeepSpeech2 on PaddlePaddle static graphs or are targeting Jetson deployment with an existing pipeline. Otherwise, follow the author’s own advice and check PPASR first.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.