Google Brain's spectrogram trick, copy-pasted into PyTorch and TF
A straightforward port of SpecAugment for developers who want to warp and mask mel spectrograms without reading the paper.

What it does
Takes a mel spectrogram and applies three augmentations in sequence: time warping, frequency masking, and time masking. The result is a distorted spectrogram you feed into a speech recognition model instead of the original. Supports both TensorFlow and PyTorch, though the interface is just two separate import paths.
The interesting bit
The original SpecAugment paper showed you could get state-of-the-art ASR results by augmenting spectrograms directly — no fancy audio domain tricks, no speed perturbation on raw waveforms. This repo is a literal implementation of that idea: warp, mask, done. The test code runs against LibriSpeech, so you can verify it actually produces the expected visual artifacts.
Key highlights
- Dual backend support:
spec_augment_tensorfloworspec_augment_pytorch pip3 install SpecAugment— one-liner install- Includes before/after spectrogram images in the README
- Apache 2.0 licensed
- Test script provided with LibriSpeech example
Caveats
- The README images are hotlinked from a different fork (
shelling203/SpecAugment), not this repo — links may rot - No version pinning or dependency list shown; “some audio libraries work properly” is the full guidance
- 654 stars but sparse recent activity; this is a reference implementation, not a maintained package
Verdict
Grab this if you need a quick, working SpecAugment for a Kaggle competition or research baseline. Skip it if you want production-hardened augmentation — look at torchaudio transforms or nlpaug instead, which have actual maintainers.