← all repositories
Stanford-TML/EDGE

Dance diffusion: when your GPU learns to boogie

A research implementation that generates editable 3D dance choreography from raw music using transformer diffusion and Jukebox features.

EDGE
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does EDGE takes a music file (WAV) and generates plausible human dance motion as 3D joint positions. It uses a transformer-based diffusion model conditioned on Jukebox music features, and can do targeted edits like joint-wise conditioning or in-betweening to fill gaps between existing poses. The output can be converted to FBX for Blender rendering.

The interesting bit The authors paired a diffusion model with Jukebox—not a lightweight music encoder, but the full 5-billion-parameter generative model—then added a custom metric called Physical Foot Contact (PFC) to penalize impossible foot sliding. The result passed a large-scale user study, which is rarer in generative motion work than you’d think.

Key highlights

  • Editable generation: joint-wise conditioning and in-betweening for fine control
  • Outputs SMPL-format motion, convertible to FBX for Blender/Mixamo pipelines
  • Includes PFC evaluation metric for physical plausibility
  • Pre-trained checkpoint available; training on AIST++ takes ~6–24 hours with 1–8 high-end GPUs
  • Optional feature caching to avoid re-extracting Jukebox representations on every run

Caveats

  • Windows is “not officially supported”; validated only on Debian 10 with NVIDIA T4
  • Jukebox feature extraction is memory-hungry and slow; full dataset preprocessing takes ~24 hours and ~50 GB
  • The authors explicitly state this is a research implementation that “will not be regularly updated or maintained long after release”
  • File names with spaces or parentheses in --music_dir cause “unpredictable behavior”

Verdict Worth a look if you’re doing generative motion research or need a baseline for music-conditioned dance generation. Skip if you want production-ready tooling or lack the GPU memory (16 GB minimum) and patience for Jukebox preprocessing.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.