NVlabs/VoxFormer
VoxFormer is a PyTorch implementation of a vision transformer for predicting 3D semantic occupancy from 2D camera images.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
VoxFormer is a sparse voxel transformer that converts 2D camera inputs into 3D semantic occupancy predictions, enabling scene understanding for autonomous vehicles. It leverages deformable attention mechanisms and transformer architectures to achieve state-of-the-art results on SemanticKITTI and other benchmarks. The project includes model implementations, training scripts, and evaluation tools for 3D semantic scene completion tasks.