menyifang/MIMO
A diffusion-based video synthesis model that generates controllable character videos from images and motion inputs.

Velocity · 7d
+2.5
★ / day
Trend
→steady
star history
MIMO is a generalizable model for controllable video synthesis that generates realistic character videos with controllable attributes including character identity, motion, and scene composition. It achieves scalability to arbitrary characters, generality to novel 3D motions, and applicability to interactive real-world scenes using spatial decomposed modeling within a diffusion model framework.