lmnt-com/diffwave
A diffusion-based neural vocoder that converts Gaussian noise into high-quality speech waveforms conditioned on Mel spectrograms.

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer built with PyTorch. It uses a diffusion probabilistic model that iteratively refines Gaussian noise into speech waveforms. The model can be conditioned on log-scaled Mel spectrograms for text-to-speech synthesis or run unconditionally for raw waveform generation. It supports fast sampling, mixed-precision training, and multi-GPU training, with pretrained models available for immediate use.