Phantom-video/HuMo
HuMo is a human-centric video generation model that conditions on multiple modalities including reference images.

Velocity · 7d
+4.6
★ / day
Trend
→steady
star history
HuMo generates videos featuring humans by combining multiple conditioning signals from different modalities. The model uses collaborative multi-modal conditioning to synthesize video content given reference images and other inputs. This research project from Tsinghua University and ByteDance includes model weights on Hugging Face and a dataset of 670K video samples.