nv-tlabs/XCube
A sparse voxel hierarchy diffusion model for large-scale feed-forward 3D generation at resolutions up to 1024^3.

XCube is a generative model for high-resolution 3D voxel grids with arbitrary attributes, using a hierarchical latent diffusion approach that generates in a coarse-to-fine manner. It operates on the VDB data structure for efficiency and can produce millions of voxels for large outdoor scenes (100m x 100m) with 10cm voxel resolution. Beyond unconditional generation, the model supports text-to-3D synthesis, user-guided editing, and scene completion from single scans.