← all repositories

YuanGongND/ltu

An audio and speech large language model that bridges audio/speech perception with natural language understanding capabilities.

ltu
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

LTU and LTU-AS are audio and speech large language models that process audio input to enable open-ended question answering alongside strong performance on closed-ended audio tasks. The repository provides PyTorch implementations with pretrained checkpoints, datasets (OpenAQA and OpenASQA), training reproduction code, and fine-tuning capabilities. Interactive HuggingFace Space demos allow users to interact with the models without local GPU resources.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.