xid32/SoundMind
SoundMind-RL is a rule-based reinforcement learning framework for endowing audio-language models with logical reasoning capabilities.

Velocity · 7d
+3.1
★ / day
Trend
→steady
star history
The repository introduces the SoundMind dataset, an Audio Logical Reasoning benchmark with 6,446 text-audio annotated samples annotated with chain-of-thought reasoning. It implements a novel RL training framework to enable audio-language models to perform complex bimodal reasoning across both audio and textual modalities. The work was published at EMNLP 2025 Main Conference as an Oral paper.