← all repositories

facebookresearch/sam-audio

A foundation model from Meta for isolating arbitrary sounds in audio mixtures using natural language, visual, or temporal prompts.

3.5k stars Python Image · Video · Audio
sam-audio
Velocity · 7d
+13
★ / day
Trend
steady
star history

SAM-Audio is a multimodal audio processing model that separates specific sounds from complex audio mixtures based on prompt inputs. It leverages a Perception-Encoder Audio-Visual (PE-AV) backbone to enable cross-modal understanding. Users can query audio by describing desired sounds in text, providing visual cues from video, or specifying time spans for extraction.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.