← all repositories
basveeling/wavenet

A WaveNet you can actually run, slowly, on a K80

A readable Keras re-implementation of DeepMind's raw-audio generative model, back when Theano was still a reasonable life choice.

wavenet
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Implements the original WaveNet architecture from DeepMind’s 2016 paper in Keras, with Sacred handling experiment management. It trains on WAV datasets (including VCTK) and generates new audio samples autoregressively, one time-step at a time, by predicting a distribution over 256 μ-law quantized amplitude values.

The interesting bit

The author openly flags where the paper is ambiguous—whether to train on every output position or only those with full receptive-field context—and documents picking one interpretation. There’s also a neat training trick: convolving a Gaussian kernel over the one-hot targets as “soft targets” to speed convergence, a practical hack not in the original paper.

Key highlights

  • Configurable via Sacred CLI: dilation depth, stacks, filters, sampling temperature, seed audio for conditioning
  • Supports custom datasets by dropping WAV files into train/test folders
  • Sampling streams incrementally to disk, so you can listen while generation continues
  • Includes downsized “small” preset for those without DeepMind’s compute budget
  • One actual generated sample exists on SoundCloud as proof of life

Caveats

  • Requires Python 2 and Theano backend; TensorFlow is explicitly “not recommended” due to open issues
  • No pretrained model available (removed after code changes broke compatibility)
  • Local and global conditioning—key to WaveNet’s voice-switching tricks—remain on the todo list
  • At 4 minutes per second of audio on a Tesla K80 for a downsized model, this is a “start it and go to lunch” kind of workflow

Verdict

Worth a look if you want to understand WaveNet’s mechanics without wading through a production framework, or need a hackable research baseline. Skip it if you need modern PyTorch, conditioning, or real-time anything.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.