← all repositories

hkchengrex/MMAudio

A deep learning model that generates synchronized audio from video and/or text inputs.

2.2k stars Python Image · Video · Audio
MMAudio
Velocity · 7d
+4.0
★ / day
Trend
steady
star history

MMAudio is a generative model for video-to-audio synthesis that takes video frames and/or text as input to produce matching audio. It uses multimodal joint training across audio-visual and audio-text datasets to enable high-quality audio generation. A synchronization module aligns the generated audio with the video frames for temporal coherence. The project provides pretrained models and interactive demos via Huggingface, Colab, and Replicate.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.