← all repositories
santi-pdp/segan

When GANs learned to shush background noise

A 2017 TensorFlow implementation that applies adversarial training directly to raw audio waveforms for speech denoising.

segan
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

SEGAN removes noise from corrupted speech by training a fully convolutional generator against a discriminator, end-to-end on raw waveforms. It handles multiple noise types and speakers without needing to know who’s speaking. The project includes data prep scripts, training orchestration, and a pre-trained v1.1 model you can download.

The interesting bit

Instead of spectrograms or hand-engineered features, it works straight on the 1-D waveform—meaning the network has to learn its own representation of what “clean” sounds like. The adversarial loss is balanced against an L1 term weighted by 100, which the authors found keeps training from collapsing into pure mimicry.

Key highlights

  • Trained on 40 noise conditions at various SNRs; tested on 20 unseen ones
  • Multi-speaker, speaker-agnostic: no identity labels needed
  • Includes prepare_data.sh to fetch and format the Edinburgh dataset automatically
  • GPU-multi training by default; CPU fallback if none available
  • Pre-trained weights and test audio samples available from the authors’ site

Caveats

  • Locked to Python 2.7 and TensorFlow 0.12—archaeology-grade dependencies at this point
  • Authors explicitly state: “no support or assistance” and “no responsibility” for the code
  • Inference requires careful flag-matching to the trained config; the clean_wav.sh wrapper helps but the CLI is still finicky

Verdict

Worth studying if you’re building modern audio GANs and want to see how the waveform-first approach was pioneered. Skip it if you need something production-ready today; the dependency stack alone is a time machine you may not want to enter.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.