lucidrains/audiolm-pytorch
A PyTorch implementation of Google's AudioLM for neural audio generation and synthesis.

This repository provides a PyTorch implementation of AudioLM, Google’s state-of-the-art language modeling approach to audio generation. It leverages transformer architectures with attention mechanisms to model audio waveforms directly. The implementation extends AudioLM with classifier-free guidance using T5, enabling text-to-audio and text-to-speech capabilities similar to VALL-E. It also includes MIT-licensed SoundStream components and is compatible with Facebook’s EnCodec for neural audio compression.