Yes — NVlabs/Sana is open source, released under the Apache-2.0 license.

What language is Sana written in?

NVlabs/Sana is primarily written in Python.

NVlabs/Sana has 8.5k stars on GitHub and is currently holding steady.

Where can I find Sana?

NVlabs/Sana is on GitHub at https://github.com/NVlabs/Sana.

NVlabs/Sana

NVIDIA's diffusion transformer that runs 4K on 8GB VRAM

A research codebase squeezing high-resolution image, video, and world-model generation into consumer GPUs through linear attention and aggressive quantization.

★8.5k stars Python Image · Video · Audio

View on GitHub ↗ Homepage ↗

Velocity · 7d

+5.7

★ / day

Trend

→steady

star history

What it does SANA is NVIDIA’s efficiency-first generative media stack: text-to-image, text-to-video, controllable world models, and reinforcement-learning post-training, all built around a linear Diffusion Transformer. The project ships complete training and inference pipelines, with variants spanning from one-step “Sprint” generation to 720p minute-long video and 6-DoF camera-controlled world models.

The interesting bit The core trick is replacing standard quadratic attention with linear attention—specifically Block Causal Linear Attention and Causal Mix-FFN—which unlocks infinite context length without the usual memory explosion. Combined with DC-AE autoencoders, 4-bit/8-bit quantization, and tiling, the system generates 4096×4096 images on 8GB VRAM or 1024px images in 0.1s on an H100.

Key highlights

SANA-Sprint: one/few-step diffusion, 0.3s per 1024px image on RTX 4090
SANA-Video: 720p generation, extensible to 2K via LTX2 refiner; real-time minute-length via LongSANA at 27FPS
SANA-WM: 2.6B parameter controllable world model with 6-DoF camera control for embodied AI
Sol-RL: post-training RL recipes (Diffusion-NFT, Flow-GRPO) for SANA, FLUX.1, and SD3.5-L, bundled with datasets
Deployment flexibility: runs via diffusers, ComfyUI, SGLang (OpenAI-compatible API), and 4-bit demos on single 3090s

Caveats

The README is a firehose of release announcements; architectural details and training costs are scattered across linked papers and docs
Multiple overlapping variants (SANA, SANA-1.5, Sprint, Video, WM) with different VAE backends—figuring out which config fits your hardware requires digging through YAMLs
World model and video variants are fresh research releases; production stability unclear from sources

Verdict Researchers and product builders who need high-res generative media on constrained hardware should dig in—this is one of the few stacks seriously optimizing for inference efficiency rather than just scale. If you’re looking for a simple, stable single-model API, the complexity surface may sting.

Frequently asked

What is NVlabs/Sana?: A research codebase squeezing high-resolution image, video, and world-model generation into consumer GPUs through linear attention and aggressive quantization.
Is Sana open source?: Yes — NVlabs/Sana is open source, released under the Apache-2.0 license.
What language is Sana written in?: NVlabs/Sana is primarily written in Python.
How popular is Sana?: NVlabs/Sana has 8.5k stars on GitHub and is currently holding steady.
Where can I find Sana?: NVlabs/Sana is on GitHub at https://github.com/NVlabs/Sana.