Is SoundMind open source?

Yes — xid32/SoundMind is open source, released under the MIT license.

What language is SoundMind written in?

xid32/SoundMind is primarily written in Python.

How popular is SoundMind?

xid32/SoundMind has 1.1k stars on GitHub.

Where can I find SoundMind?

xid32/SoundMind is on GitHub at https://github.com/xid32/SoundMind.

← all repositories

xid32/SoundMind

Teaching audio-language models to think in rules, not vibes

It builds a 6,446-sample dataset and a rule-based RL framework to teach audio-language models structured logical reasoning across sound and text.

★1.1k stars Python Language Models Data Tooling ML Frameworks

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does SoundMind is the official implementation of an EMNLP 2025 oral paper. It provides the Audio Logical Reasoning (ALR) benchmark—6,446 samples annotated with chain-of-thought reasoning across audio and text—and a rule-based RL training framework called SoundMind-RL. The goal is to make audio-language models handle complex reasoning that requires drawing conclusions from both modalities. The codebase extends the verl reinforcement-learning library and targets Qwen2.5-Omni.

The interesting bit Instead of the usual open-ended reward modeling, SoundMind-RL uses explicit rule-based rewards to incentivize step-by-step logical structure. The ALR dataset includes chain-of-thought annotations in both audio and text, and the training code supports audio-only, text-only, or bimodal inputs—useful for isolating where the reasoning actually lives.

Key highlights

6,446-sample ALR dataset with paired audio-text chain-of-thought annotations
Rule-based RL framework (SoundMind-RL) built on verl
Supports unimodal ablations: audio-only, text-only, or both
Pre-trained checkpoint and full training code provided
Targets Qwen2.5-Omni; accepted as EMNLP 2025 Main Conference oral

Caveats

The authors recommend eight H800 or H100 80 GB GPUs, so this is not a laptop experiment
Dependency stack is brittle: the README warns that CUDA, cuDNN, and packages like vLLM or SGLang are easily overridden during installation
The dataset is narrowly focused on logical reasoning; it will not teach general audio understanding

Verdict Researchers working on multimodal reasoning or audio-language alignment should dig in. If you lack a server-grade GPU cluster or need a general-purpose audio model, look elsewhere.

Frequently asked

What is xid32/SoundMind?: It builds a 6,446-sample dataset and a rule-based RL framework to teach audio-language models structured logical reasoning across sound and text.
Is SoundMind open source?: Yes — xid32/SoundMind is open source, released under the MIT license.
What language is SoundMind written in?: xid32/SoundMind is primarily written in Python.
How popular is SoundMind?: xid32/SoundMind has 1.1k stars on GitHub.
Where can I find SoundMind?: xid32/SoundMind is on GitHub at https://github.com/xid32/SoundMind.