← all repositories

FunAudioLLM/ThinkSound

ThinkSound is a unified framework for generating audio from any modality (text, video, images) guided by Chain-of-Thought reasoning using PyTorch and flow matching.

1.4k stars Python Image · Video · Audio
ThinkSound
Velocity · 7d
+3.9
★ / day
Trend
steady
star history

ThinkSound presents a unified approach to any-modality-to-audio generation, leveraging Chain-of-Thought reasoning to guide the generation process. The framework uses flow matching as its core synthesis mechanism and supports diverse inputs including text descriptions, video, and images. This NeurIPS 2025 publication from FunAudioLLM targets tasks like foley sound synthesis, sound effect generation, and multimodal audio creation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.