Is cosmos open source?

Yes — NVIDIA/cosmos is an open-source project tracked on heatdrop.

What language is cosmos written in?

NVIDIA/cosmos is primarily written in Jupyter Notebook.

How popular is cosmos?

NVIDIA/cosmos has 11.2k stars on GitHub and is currently accelerating.

Where can I find cosmos?

NVIDIA/cosmos is on GitHub at https://github.com/NVIDIA/cosmos.

← all repositories

NVIDIA/cosmos

One transformer family that generates worlds and reasons about them

NVIDIA open-sourced a family of omnimodal world models that simultaneously synthesize training data and reason about physical environments for robots and autonomous systems.

★11.2k stars Jupyter Notebook Image · Video · Audio Inference · Serving Domain Apps

View on GitHub ↗ Homepage ↗

Velocity · 7d

+20

★ / day

Trend

↗accelerating

star history

What it does

Cosmos 3 is a family of open world models—from 16B to 64B parameters—that ingest and produce language, images, video, audio, and action sequences within a single architecture. It exposes two modes: a Generator that synthesizes images, videos, sound, and robot action rollouts from multimodal prompts, and a Reasoner that analyzes visual inputs to output captions, physical reasoning, and task plans. NVIDIA targets it at Physical AI workloads such as robotics, autonomous vehicles, and smart infrastructure.

The interesting bit

The architecture is a Mixture-of-Transformers that shares weights between an autoregressive reasoning path and a diffusion generation path, using the same 3D rotary position embeddings to preserve spatial and temporal structure across modalities. That means one backbone can caption a video, predict a robot’s next move, or generate a synthetic training rollout without swapping networks.

Key highlights

Dual runtime surfaces: Generator produces vision, sound, and action outputs; Reasoner produces text for grounding, planning, and forecasting.
Action-aware out of the box: supports camera motion (9D), autonomous vehicle (9D), egocentric motion (57D), single-arm robots like DROID/UR/Fractal/Bridge/UMI (10D), dual-arm (20D), and humanoid AgiBot (29D).
Generation specs are tightly bounded: up to 720p, 30 FPS, 300 frames, BF16 precision, on Linux with Ampere, Hopper, or Blackwell GPUs.
Research and production paths included: Diffusers and Transformers for experimentation, vLLM-Omni and vLLM for OpenAI-compatible serving.
Specialized variants exist for high-fidelity text-to-image, image-to-video, and DROID manipulation policy.

Caveats

Post-training adaptation recipes and the Cosmos Framework are marked “Coming Soon,” so custom fine-tuning workflows are not yet available.
The README dedicates substantial space to troubleshooting CUDA versions, container mismatches, and missing system libraries like libxcb, which suggests deployment is not frictionless.
Linux and NVIDIA Ampere-or-newer hardware are hard requirements; there is no mention of other platforms.

Verdict

Grab it if you’re building data synthesis or perception-planning pipelines for robotics and autonomous hardware. Look elsewhere if you need lightweight, cross-platform models or immediate training recipes.

Frequently asked

What is NVIDIA/cosmos?: NVIDIA open-sourced a family of omnimodal world models that simultaneously synthesize training data and reason about physical environments for robots and autonomous systems.
Is cosmos open source?: Yes — NVIDIA/cosmos is an open-source project tracked on heatdrop.
What language is cosmos written in?: NVIDIA/cosmos is primarily written in Jupyter Notebook.
How popular is cosmos?: NVIDIA/cosmos has 11.2k stars on GitHub and is currently accelerating.
Where can I find cosmos?: NVIDIA/cosmos is on GitHub at https://github.com/NVIDIA/cosmos.