← all repositories
iver56/torch-audiomentations

GPU-native audio augmentation that stays in PyTorch

Because copying tensors to CPU for data augmentation is a bottleneck nobody asked for.

1.2k stars Python Data ToolingML Frameworks
torch-audiomentations
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

torch-audiomentations is a collection of audio transforms—gain, filtering, pitch shift, noise injection, impulse response convolution—that operate directly on PyTorch tensors. They subclass nn.Module, so you can drop them into a model or training pipeline without leaving the GPU. The library supports batched multichannel audio and offers three randomization modes: per-batch, per-example, or per-channel.

The interesting bit

The “mode” parameter is the quietly clever part. It lets you control whether an augmentation applies identically across a batch, varies per sample, or even per stereo channel—useful for fighting positional bias or simulating realistic recording conditions without hand-rolling tensor indexing.

Key highlights

  • Most transforms are differentiable, so they can live inside the forward pass if needed
  • GPU speedups exist but vary by transform; the README is honest that not everything beats CPU
  • 15+ waveform transforms including TimeInversion (reverse audio like a random image flip) and ShuffleChannels
  • Compose API with OneOf/SomeOf for stochastic transform selection
  • Recently dropped librosa dependency in favor of torchaudio

Caveats

  • Target data processing (e.g., augmenting labels alongside inputs) is experimental with a workaround involving freeze_parameters
  • Multiprocessing can leak memory; CPU transforms are the suggested workaround
  • Multi-GPU / DDP is not officially supported—the author is literally asking for hardware donations to test it
  • PitchShift struggles with small shifts at low sample rates

Verdict

Worth a look if you’re training audio models in PyTorch and tired of CPU-GPU ping-pong during data loading. Skip it if you need mature multi-GPU support or heavy spectral-domain augmentation; this is waveform-only and still early-stage.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.