← all repositories
NVIDIA/Isaac-GR00T

NVIDIA open-sources a 3B-parameter robot brain

A vision-language-action model that watches human videos, then controls real humanoid arms.

Isaac-GR00T
Velocity · 7d
+16
★ / day
Trend
steady
star history

What it does

GR00T N1.7 is a 3-billion-parameter vision-language-action (VLA) model: feed it a camera image and a text command like “pick up the red block,” and it outputs continuous robot motor commands. It is designed to work across different robot bodies—humanoids, bimanual arms, semi-humanoid platforms—after fine-tuning on your own demonstration data. NVIDIA ships pre-trained weights, inference scripts, a fine-tuning pipeline, and TensorRT deployment paths under an Apache 2.0 license.

The interesting bit

The model learns partly from 20,000 hours of egocentric human video, not just robot data. N1.7 uses a relative end-effector action space—deltas from the current pose rather than absolute coordinates—which is shared between human and robot embodiments. That shared representation is what lets it transfer manipulation skills learned from people to machines.

Key highlights

  • New VLM backbone: Cosmos-Reason2-2B (Qwen3-VL architecture), replacing the previous Eagle backbone; handles native image aspect ratios without padding.
  • Hardware targets are wide: inference runs on a 16 GB GPU (RTX 4090 up to H100), fine-tuning wants 40 GB+, and Jetson Thor/Orin plus DGX Spark are explicitly supported.
  • Deployment stack includes ONNX and TensorRT export, plus a Gr00tPolicy API to wire into real robot controllers.
  • Setup uses uv for dependency management; pip is a documented fallback.

Caveats

  • This is an Early Access release: no production support, no PR contributions accepted yet, and benchmarks are incomplete until GA.
  • Platform matrix is finicky: CUDA 13 needs a Triton patch, GB300 GPUs are unsupported by torch.compile, and aarch64 video decoding requires a specific torchcodec wheel.
  • flash-attn re-validates on every uv run invocation (2–3 second delay) due to URL-pinned wheels; removing the pin breaks future locking.

Verdict

Worth exploring if you have a humanoid or manipulation robot and the GPU budget to fine-tune. Skip it if you need production-grade stability today or are hoping for a drop-in replacement for your existing ROS stack without heavy lifting.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.