← all repositories

researchmm/MM-Diffusion

A PyTorch implementation of a diffusion model that generates aligned audio-video pairs using a sequential multi-modal U-Net with separate audio and video subnets.

454 stars Python Image · Video · Audio
MM-Diffusion
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

This repository implements the MM-Diffusion framework for joint audio and video generation, accepted at CVPR 2023. It uses a sequential multi-modal U-Net architecture where two subnets learn to generate aligned audio-video pairs from Gaussian noise. The model supports conditional generation and was trained on datasets including landscape, AIST++, and AudioSet.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.