Is kimodo open source?

Yes — nv-tlabs/kimodo is open source, released under the Apache-2.0 license.

What language is kimodo written in?

nv-tlabs/kimodo is primarily written in Python.

How popular is kimodo?

nv-tlabs/kimodo has 3k stars on GitHub and is currently holding steady.

Where can I find kimodo?

nv-tlabs/kimodo is on GitHub at https://github.com/nv-tlabs/kimodo.

← all repositories

nv-tlabs/kimodo

A diffusion model that keyframes humanoids like video edits

Kimodo generates 3D human and robot motion from text prompts and precise kinematic constraints like keyframes, end-effector positions, and 2D paths.

★3k stars Python Image · Video · Audio Domain Apps

View on GitHub ↗

Velocity · 7d

+8.3

★ / day

Trend

→steady

star history

What it does

Kimodo is a diffusion model trained on 700 hours of motion-capture data that synthesizes 3D animations for human bodies and robots. Users steer it through text prompts and a dense set of kinematic constraints—full-body poses, end-effector positions, and 2D root paths—either via a local web demo with a timeline editor or a command-line interface. The repository also includes a benchmark suite and evaluation code for measuring how well models follow combined text and constraint instructions.

The interesting bit

Rather than treating motion as a black-box text prompt problem, Kimodo behaves more like animation software: you drop keyframes and waypoints onto tracks and let the diffusion model interpolate the in-betweens. It is also skeleton-agnostic enough to drive both the SOMA human rig and the Unitree G1 robot with the same architecture.

Key highlights

Accepts simultaneous text prompts and spatial constraints, including full-body keyframes, hand/foot end-effector rotations, and 2D ground-plane waypoints.
Ships with an interactive timeline demo featuring real-time 3D preview, playback controls, and export to NPZ, MuJoCo CSV, or AMASS formats.
Provides a public benchmark built on the BONES-SEED dataset specifically to test text adherence and constraint-following accuracy.
Offers multiple checkpoints trained on either a 700-hour commercially licensed mocap corpus or the smaller 288-hour public BONES-SEED set.
Full GPU generation demands roughly 17 GB of VRAM, though the text encoder can be offloaded to CPU to drop usage below 3 GB at a speed cost.

Caveats

The codebase is developed primarily for Linux; Windows is explicitly noted as untested outside of Docker.
A breaking change in March 2026 shifted all model inputs and outputs to the SOMA 77-joint skeleton, which may fracture compatibility with earlier pipelines.
The SMPL-X variant is released under a more restrictive NVIDIA R&D Model license, unlike the other checkpoints which use the NVIDIA Open Model license.

Verdict

Researchers and technical animators who need fine-grained control over humanoid motion will find Kimodo unusually direct; casual users without high-end NVIDIA hardware or interest in kinematic rigging should probably admire from a distance.

Frequently asked

What is nv-tlabs/kimodo?: Kimodo generates 3D human and robot motion from text prompts and precise kinematic constraints like keyframes, end-effector positions, and 2D paths.
Is kimodo open source?: Yes — nv-tlabs/kimodo is open source, released under the Apache-2.0 license.
What language is kimodo written in?: nv-tlabs/kimodo is primarily written in Python.
How popular is kimodo?: nv-tlabs/kimodo has 3k stars on GitHub and is currently holding steady.
Where can I find kimodo?: nv-tlabs/kimodo is on GitHub at https://github.com/nv-tlabs/kimodo.