← all repositories

ga642381/speech-trident

A curated survey repository covering speech representation models, neural codecs, and speech large language models.

speech-trident
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

This repository surveys three key areas in speech/audio large language models: (1) speech representation learning for semantic token extraction, (2) neural codec models that compress audio into discrete acoustic tokens at low bitrates while preserving reconstruction quality, and (3) speech large language models trained in a language-modeling paradigm on speech and acoustic tokens for tasks spanning speech understanding and generation. It serves as a reference list of relevant research works and models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.