astra-vision/MonoScene
MonoScene predicts 3D semantic occupancy grids and scene structures from single RGB images using deep learning.

Velocity · 7d
+0.5
★ / day
Trend
→steady
star history
MonoScene is a CVPR 2022 research project that reconstructs complete 3D semantic scenes from a single monocular image. It uses a deep learning approach with a frustum fusion strategy and global context aggregation to predict voxel-based semantic occupancy in 3D space. The framework is implemented in PyTorch and trained on datasets like SemanticKITTI, KITTI-360, and NYUv2.