Tencent open-sources a world model that builds playable 3D levels, not videos
HY-World 2.0 generates actual 3D assets—meshes, Gaussian splats, point clouds—instead of the usual flickering video loops.

What it does
HY-World 2.0 is a multi-modal pipeline that turns text prompts, single images, multi-view photos, or casual videos into explorable 3D worlds. It outputs real geometry—meshes, 3D Gaussian splattings, and point clouds—that you can import into Blender, Unity, Unreal Engine, or Isaac Sim. The system splits into two tracks: World Generation (text/image → 3D scene via panorama, trajectory planning, stereo expansion, and composition) and World Reconstruction (video/images → 3D via a single feed-forward pass).
The interesting bit
The project explicitly positions itself against the current wave of video world models (Genie 3, Cosmos, etc.). Its pitch: those generate “movies that vanish once playback ends,” while HY-World 2.0 builds “a playable game”—persistent, editable, with real physics collision and zero per-frame inference cost. The README even includes a comparison table with the tagline “Watch a video, then it’s gone” vs. “Build a world, keep it forever.”
Key highlights
- Four-stage generation pipeline: HY-Pano 2.0 (panorama) → WorldNav (trajectory) → WorldStereo 2.0 (expansion) → WorldMirror 2.0 + 3DGS learning (composition)
- WorldMirror 2.0 reconstructs depth, normals, camera params, point clouds, and 3DGS attributes in one forward pass
- Supports interactive first-person and third-person exploration with physics-based collision
- Full inference code and model weights released, including an 80B-parameter panorama model and a 17B-parameter world expansion model
- Requires CUDA 12.8, Python 3.11+, and a custom FlashAttention build; install path is split between “worldrecon” and heavier “worldgen” dependencies
Caveats
- Setup is involved: custom gsplat variant, git submodules for navmesh, separate README for the 80B HY-Pano-2 model, and FlashAttention-3 recommended for Hopper GPUs
- The README truncates mid-sentence during the HY-Pano-2 usage section, so some CLI details are incomplete in the source
Verdict
Worth a look if you’re building AI-generated environments for games, simulation, or digital twins. Skip it if you need a quick one-liner demo—this is a research-grade assembly line, not a single-button app.