Is GEN3C open source?

Yes — nv-tlabs/GEN3C is open source, released under the Apache-2.0 license.

What language is GEN3C written in?

nv-tlabs/GEN3C is primarily written in Jupyter Notebook.

How popular is GEN3C?

nv-tlabs/GEN3C has 1.4k stars on GitHub.

Where can I find GEN3C?

nv-tlabs/GEN3C is on GitHub at https://github.com/nv-tlabs/GEN3C.

← all repositories

nv-tlabs/GEN3C

Teaching video models to read a 3D map instead of memorizing frames

GEN3C generates video by rendering a live point-cloud cache, so the model never has to remember what it just drew or guess where the camera went.

★1.4k stars Jupyter Notebook Image · Video · Audio

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

GEN3C is a 7B-parameter video diffusion model built on NVIDIA Cosmos that turns a single image, a video clip, or sparse multiview photos into extended video sequences along a user-defined camera path. Instead of asking the neural network to infer geometry from raw camera parameters, it maintains an explicit 3D cache—point clouds derived from predicted depth maps—and conditions each new frame on 2D renderings of that cache from the desired viewpoint. This frees the diffusion model to focus on filling previously unseen regions and advancing motion rather than playing 3D chess in latent space.

The interesting bit

The clever part is the decoupling of memory and imagination: because the model renders the scene from the point cloud, it does not need to remember earlier generated frames or reverse-engineer structure from camera poses. The README claims this yields more precise camera control than prior work and state-of-the-art sparse-view novel view synthesis, even in monocular dynamic video and driving scenes.

Key highlights

Built on NVIDIA Cosmos and Stable Video Diffusion, released as a CVPR 2025 Highlight
Supports single-image, video-to-video, and multiview-image inputs via an interactive GUI or scripted inference
Generates autoregressive sequences (121 frames per chunk, extendable to 361+ frames)
Requires high-end NVIDIA hardware; the authors have only tested on H100 and A100 GPUs, with ~43 GB peak memory when fully offloading models
For video input, it needs externally supplied depth, intrinsics, and extrinsics (the authors recommend their companion tool ViPE)

Caveats

The authors explicitly state they have tested inference only on H100 and A100 GPUs; consumer hardware is unsupported territory
Video and multiview workflows depend on external preprocessing (depth, camera poses) from tools like ViPE or VGGT, so it is not a single-click image-to-video drop-in for all use cases
A driving-finetuned model and the joint depth-and-pose prediction pipeline are listed as future updates, not yet released

Verdict

Researchers and technical artists with access to enterprise GPUs who need temporally consistent, camera-controllable video generation should look here; casual users looking for a lightweight, consumer-grade video toy will find the hardware and preprocessing requirements prohibitive.

Frequently asked

What is nv-tlabs/GEN3C?: GEN3C generates video by rendering a live point-cloud cache, so the model never has to remember what it just drew or guess where the camera went.
Is GEN3C open source?: Yes — nv-tlabs/GEN3C is open source, released under the Apache-2.0 license.
What language is GEN3C written in?: nv-tlabs/GEN3C is primarily written in Jupyter Notebook.
How popular is GEN3C?: nv-tlabs/GEN3C has 1.4k stars on GitHub.
Where can I find GEN3C?: nv-tlabs/GEN3C is on GitHub at https://github.com/nv-tlabs/GEN3C.