InternRobotics/EmbodiedScan
A multi-modal 3D perception dataset and benchmark for embodied AI agents combining RGB-D vision with language understanding.

EmbodiedScan provides a holistic ego-centric 3D perception suite for embodied AI research, encompassing over 5k scans with 1M RGB-D views and 1M language prompts for training and evaluating agents that must understand 3D scenes from a first-person perspective. It includes 3D bounding boxes across 760 categories and dense semantic occupancy annotations, along with a baseline framework called Embodied Perceptro for multi-modal 3D scene understanding. The dataset and benchmark enable research on grounding language instructions into physical 3D environments for autonomous robotic agents.