kan-bayashi/ParallelWaveGAN
PyTorch implementation of neural vocoder models (Parallel WaveGAN, MelGAN, HiFi-GAN, StyleMelGAN) for real-time text-to-speech synthesis.

This repository provides unofficial PyTorch implementations of several state-of-the-art non-autoregressive neural vocoder models for converting mel-spectrograms to audio waveforms. The models include Parallel WaveGAN, MelGAN, Multi-band MelGAN, HiFi-GAN, and StyleMelGAN. These vocoders are designed to work with TTS systems like ESPnet-TTS and can generate high-quality speech in real time when combined with a mel-spectrogram predictor.