A gym for robots that prefer IKEA to Unreal Engine
Reinforcement learning simulator built on real panoramic scans of actual buildings, not synthetic worlds.

What it does
The Matterport3D Simulator drops RL agents into 90 real indoor environments—homes, offices, churches, hotels—captured as dense 360° RGB-D panoramas. Agents look around, move between viewpoints, and navigate by natural language instructions. It’s essentially a training ground for vision-and-language navigation research, with C++ and Python APIs and a Dockerized build that (the authors hope) saves you from dependency hell.
The interesting bit
The visual complexity comes from reality, not a graphics engine. The dataset uses actual Matterport scans with all the messy detail—shiny surfaces, depth holes, awkward lighting—that synthetic environments usually sanitize away. The simulator renders off-screen at ~1000 fps on a Titan X, which is fast enough that the “real images” trade-off doesn’t mean “real slow.”
Key highlights
- 90 indoor scenes, 8–349 viewpoints each, ~2.25m apart across full walkable floorplans
- Real RGB-D output (not synthetic); depth enabled via preprocessing script
- Three rendering backends: GPU (OpenGL/X11), off-screen GPU (EGL, recommended), off-screen CPU (OSMesa)
- Batched multi-agent support since the 2019 update; v0.1 tag preserved for API stability
- Ships with Room-to-Room (R2R) task data, EvalAI leaderboard, and web interface for AMT data collection
Caveats
- Dataset access requires request/approval; minimum 50 GB RAM for full timing test
- Depth-to-RGB alignment is approximate; perfect alignment needs manual re-stitching from undistorted color images
- Docker + nvidia-docker2.0 effectively mandatory unless you enjoy local dependency archaeology
Verdict
Grab this if you’re doing embodied navigation, instruction-following, or any RL research where synthetic environments feel too clean. Skip it if you need outdoor scenes, perfect depth-RGB alignment, or a quick-start without dataset paperwork.