jik876/hifi-gan
A GAN-based deep learning model for efficient and high-fidelity text-to-speech synthesis.

Velocity · 7d
+1.1
★ / day
Trend
→steady
star history
HiFi-GAN is a generative adversarial network architecture for speech synthesis that achieves high fidelity audio generation by modeling periodic patterns in audio signals. The model generates 22.05 kHz audio at 167.9x real-time on a single V100 GPU, with a CPU-optimized variant achieving 13.4x real-time performance. It supports mel-spectrogram inversion for arbitrary speakers and can be used as an end-to-end vocoder in larger text-to-speech pipelines.