gemelo-ai/vocos
A neural vocoder that synthesizes high-quality audio waveforms from mel-spectrograms or EnCodec tokens using a GAN-based approach.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
Vocos is a fast neural vocoder that generates audio waveforms from acoustic features in a single forward pass. Unlike typical time-domain GAN vocoders, it generates spectral coefficients which are rapidly converted to audio via inverse Fourier transform. It supports inference from mel-spectrograms and EnCodec quantization tokens, with pretrained models available at 24kHz.