← all repositories
sergeytulyakov/mocogan

A GAN that learned to separate actor from action

MoCoGAN disentangles motion and content in video generation, letting you swap faces while keeping the expression—or vice versa.

mocogan
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

MoCoGAN is a generative adversarial network for video. It takes random noise and produces video clips, but with a twist: the noise is split into two parts. One part controls what appears (content: identity, shape, color), the other controls what happens (motion: expression, action, trajectory). The README shows examples trained on facial expressions, human actions, and a 4,500-video TaiChi dataset.

The interesting bit

The disentanglement is the hook. Fix the content vector and sweep the motion vector, and the same person smiles, then frowns, then looks surprised. Fix motion and swap content, and different people perform the identical action. It’s a latent-space video editing tool that predates the current diffusion hype by several years.

Key highlights

  • Published at CVPR 2018; code is the original authors’ implementation
  • Trained on three distinct domains: faces (MUG dataset), human actions (Weizmann), and TaiChi
  • Wiki page exists for training instructions, though the README doesn’t detail architecture or dependencies
  • Two alternative implementations linked: PyTorch and Chainer
  • Poster and paper PDF available in-repo

Caveats

  • README is sparse on setup: no requirements.txt, no explicit dependency list, no hardware notes
  • The shapes.gif example is commented out in the source, not currently rendered
  • Training details are offloaded to a wiki page rather than version-controlled docs

Verdict

Worth a look if you’re studying video disentanglement or building latent-space video editors. Skip if you need a batteries-included, ready-to-train codebase with modern PyTorch conventions.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.