← all repositories

lmnt-com/diffwave

A diffusion-based neural vocoder that converts Gaussian noise into high-quality speech waveforms conditioned on Mel spectrograms.

888 stars Python Image · Video · Audio
diffwave
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer built with PyTorch. It uses a diffusion probabilistic model that iteratively refines Gaussian noise into speech waveforms. The model can be conditioned on log-scaled Mel spectrograms for text-to-speech synthesis or run unconditionally for raw waveform generation. It supports fast sampling, mixed-precision training, and multi-GPU training, with pretrained models available for immediate use.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.