← all repositories
astorfi/speechpy

SpeechPy: the boring speech-processing work, bottled

A Python library that extracts MFCCs and filterbank energies so you don't have to reimplement the DSP textbook.

speechpy
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

SpeechPy turns raw audio waveforms into the standard feature vectors that speech recognizers actually eat: MFCCs, mel-filterbank energies, and their log variants. It also handles the housekeeping—stacking frames, pre-emphasis, power spectrum computation, plus global and sliding-window cepstral mean/variance normalization (CMVN). Basically the classic front-end pipeline that Kaldi does, but in pure Python with numpy.

The interesting bit

The library is deliberately narrow. It doesn’t train models or run inference; it just solves the “read a WAV, get a matrix” problem with sensible defaults (20 ms frames, 10 ms stride, 40 mel filters). The CMVN implementation is a nice touch—both global and local windowed versions—since channel normalization is where a lot of tutorial code quietly falls over.

Key highlights

  • MFCC, filterbank energy, and log-filterbank extraction with standard parametric control
  • Frame stacking with optional zero-padding and custom windowing
  • Global and sliding-window CMVN for channel compensation
  • Published in JOSS with a DOI, if citations matter to your pipeline
  • pip-installable; depends on standard scipy/numpy stack

Caveats

  • Python 2.7, 3.4, and 3.5 are the documented/tested versions; the README hasn’t been updated for newer Pythons, so compatibility is unclear
  • The project appears largely unmaintained—last meaningful activity is several years old
  • No GPU acceleration; this is CPU numpy all the way

Verdict

Good for students, researchers, or prototype pipelines that need classic acoustic features without dragging in all of Kaldi. Skip it if you want end-to-end neural features or a maintained, modern dependency.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.