← all repositories
DemisEom/SpecAugment

Google Brain's spectrogram trick, copy-pasted into PyTorch and TF

A straightforward port of SpecAugment for developers who want to warp and mask mel spectrograms without reading the paper.

654 stars Python Data Tooling
SpecAugment
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Takes a mel spectrogram and applies three augmentations in sequence: time warping, frequency masking, and time masking. The result is a distorted spectrogram you feed into a speech recognition model instead of the original. Supports both TensorFlow and PyTorch, though the interface is just two separate import paths.

The interesting bit

The original SpecAugment paper showed you could get state-of-the-art ASR results by augmenting spectrograms directly — no fancy audio domain tricks, no speed perturbation on raw waveforms. This repo is a literal implementation of that idea: warp, mask, done. The test code runs against LibriSpeech, so you can verify it actually produces the expected visual artifacts.

Key highlights

  • Dual backend support: spec_augment_tensorflow or spec_augment_pytorch
  • pip3 install SpecAugment — one-liner install
  • Includes before/after spectrogram images in the README
  • Apache 2.0 licensed
  • Test script provided with LibriSpeech example

Caveats

  • The README images are hotlinked from a different fork (shelling203/SpecAugment), not this repo — links may rot
  • No version pinning or dependency list shown; “some audio libraries work properly” is the full guidance
  • 654 stars but sparse recent activity; this is a reference implementation, not a maintained package

Verdict

Grab this if you need a quick, working SpecAugment for a Kaggle competition or research baseline. Skip it if you want production-hardened augmentation — look at torchaudio transforms or nlpaug instead, which have actual maintainers.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.