← all repositories
nv-tlabs/kimodo

NVIDIA's motion diffusion model actually ships with a timeline editor

Kimodo generates 3D human and robot motion from text prompts plus precise kinematic constraints—keyframes, end-effector positions, 2D paths—rather than hoping the model guesses your intent.

kimodo
Velocity · 7d
+30
★ / day
Trend
steady
star history

What it does Kimodo is a diffusion model trained on 700 hours of commercially-friendly motion capture. It generates 3D motion for human and robot skeletons (SOMA, Unitree G1, SMPL-X) controlled by text prompts and an unusually broad set of kinematic constraints: full-body pose keyframes, end-effector positions/rotations, 2D root paths, and waypoints. The repo includes inference code, a CLI, a web-based interactive demo with timeline editing, and a benchmark suite built on the BONES-SEED dataset.

The interesting bit Most motion generation tools treat text as the only steering wheel. Kimodo adds a full constraint stack—pose keyframes, hand/foot targets, ground-plane paths—and exposes it through a Gradio-like web demo where you author motions on a multi-track timeline. The model also auto-downloads from Hugging Face, so you don’t wrestle with weights manually.

Key highlights

  • Ships with six model variants across three skeletons (SOMA 77-joint, G1, SMPL-X), with RP models trained on 700h mocap recommended over the 288h SEED variants
  • Interactive demo runs locally at 127.0.0.1:7860 with real-time 3D preview, constraint editing, and export to NPZ/MuJoCo CSV/AMASS formats
  • CLI supports classifier-free guidance with separate weights for text vs. constraints, plus optional foot-skate cleanup post-processing
  • VRAM requirement drops from ~17 GB to <3 GB by offloading text encoding to CPU via TEXT_ENCODER_DEVICE=cpu
  • Includes a Motion Generation Benchmark with test cases and evaluation code for comparing constraint-following accuracy across models

Caveats

  • Developed on Linux; Windows support exists but is less tested (Docker recommended)
  • SMPL-X variant carries a stricter NVIDIA R&D Model license, unlike the Open Model license for SOMA and G1 variants
  • A March 2026 breaking change switched SOMA models to a 77-joint skeleton (somaskel77), so older integrations may need updating

Verdict Worth a look if you’re building animation pipelines, robotics simulators, or game tools where artists need precise control over generated motion—not just a lucky text prompt. Skip if you need real-time runtime generation or lightweight CPU-only inference; this is still research-grade diffusion with GPU appetite.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.