Depth maps that scale like vector graphics, not bitmaps
A CVPR 2026 project turns single RGB images into resolution-independent depth using neural implicit fields, then goes further into 3D Gaussians and sensor fusion.

What it does InfiniDepth estimates depth from a single RGB image at any resolution you ask for—upsample, original, or a specific size—rather than being locked to the network’s training resolution. It can also spit out 3D Gaussian Splatting scenes, novel-view orbit videos, and point clouds. If you have a depth sensor, a separate mode fuses that sparse metric data to produce metric depth and aligned 3D Gaussians.
The interesting bit The “arbitrary-resolution” claim is the hook: most depth networks output a fixed grid, but InfiniDepth uses neural implicit fields to query depth continuously, like sampling a signed distance function at whatever density you need. The repo also bundles multi-view/video processing that aligns per-frame predictions into a global point cloud, optionally using Depth Anything 3 for sequence-level consistency.
Key highlights
- Three inference modes: RGB-only relative depth, RGB + depth sensor metric depth, and multi-view/video with global alignment
- Outputs depth maps, point clouds (
.ply), 3D Gaussian scenes, and optional novel-view orbit/swing videos - Gradio demo included; hosted Hugging Face space available for testing before local install
- Supports multiple sparse depth formats:
.png,.npy,.npz,.h5,.hdf5,.exr - Training and evaluation code released as of April 2026; inference code arrived March 2026
Caveats
- The README is thorough on inference but says nothing about training data, compute requirements, or quantitative benchmarks against prior work
- Multi-view mode depends on Depth Anything 3 (DA3-LARGE-1.1) for default sequence alignment, adding a heavy external dependency
- Camera intrinsics (
fx_org,fy_org,cx_org,cy_org) are “strongly recommended” for sensor fusion mode; the fallback behavior is unspecified
Verdict Worth a look if you need depth or 3D Gaussian exports at non-standard resolutions, or if you’re sitting on RGB+LiDAR data that needs densifying. Skip it if you want a lightweight drop-in replacement for MiDaS—this is a research codebase with multiple model checkpoints and a nontrivial setup.