← all repositories
descriptinc/melgan-neurips

GANs that speak in complete sentences, not static

MelGAN proves you can generate raw audio waveforms with adversarial training if you stop designing networks like it's 2014.

1k stars Python Image · Video · Audio
melgan-neurips
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does MelGAN inverts mel-spectrograms back into raw audio waveforms using a fully convolutional, non-autoregressive GAN. It’s the vocoder behind Descript’s Overdub speech-correction tool, and it runs at 100× real-time on a GTX 1080Ti or 2× real-time on a bare CPU.

The interesting bit The authors didn’t just throw a GAN at audio and hope. They identified why previous attempts produced incoherent waveforms, then proposed architectural changes and training techniques that actually stick. The same model generalizes to unseen speakers, music domain translation, and unconditional music synthesis without rearchitecting.

Key highlights

  • Non-autoregressive and fully convolutional — no sample-by-sample autoregression bottleneck
  • “Significantly fewer parameters than competing models” (the README’s phrasing, not a hard number)
  • Ships with PyTorch Hub integration: torch.hub.load(...) and call vocoder.inverse()
  • Ablation studies and design guidelines included for building discriminators/generators on sequences
  • NeurIPS 2019 official implementation; slides and audio samples available

Caveats

  • README mentions a blog post with “samples and accompanying code coming soon” — this is a 2019 repo, so “soon” may have elapsed
  • Dataset preparation is manual shell commands (ls, tail, head) rather than a configured data pipeline
  • No pretrained model weights linked in the README; you’ll likely need to train from scratch or dig through the paper site

Verdict Worth studying if you’re building vocoders or conditional sequence generators and want a fast, non-autoregressive baseline. Skip if you need a batteries-included, ready-to-finetune TTS pipeline with pretrained checkpoints.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.