ga642381/speech-trident
A curated survey repository covering speech representation models, neural codecs, and speech large language models.

This repository surveys three key areas in speech/audio large language models: (1) speech representation learning for semantic token extraction, (2) neural codec models that compress audio into discrete acoustic tokens at low bitrates while preserving reconstruction quality, and (3) speech large language models trained in a language-modeling paradigm on speech and acoustic tokens for tasks spanning speech understanding and generation. It serves as a reference list of relevant research works and models.