hkchengrex/MMAudio
A deep learning model that generates synchronized audio from video and/or text inputs.

Velocity · 7d
+4.0
★ / day
Trend
→steady
star history
MMAudio is a generative model for video-to-audio synthesis that takes video frames and/or text as input to produce matching audio. It uses multimodal joint training across audio-visual and audio-text datasets to enable high-quality audio generation. A synchronization module aligns the generated audio with the video frames for temporal coherence. The project provides pretrained models and interactive demos via Huggingface, Colab, and Replicate.