NVIDIA/audio-flamingo
NVIDIA's Audio Flamingo is a series of open-source multimodal LLMs that understand speech and music through natural language interactions.

Velocity · 7d
+1.5
★ / day
Trend
→steady
star history
Audio Flamingo provides PyTorch implementations of large language models trained to understand audio through text queries. The models support audio captioning, question answering, reasoning, and long-audio understanding across speech and music domains. Multiple versions have been published at top ML venues (ICML, NeurIPS), with the latest being fully open-sourced for research use.