Meta's SAM 3D Body: promptable human mesh recovery from a single photo
A foundation model that turns one image into a full 3D body mesh, optionally guided by keypoints or masks like the original SAM.

What it does SAM 3D Body (3DB) reconstructs a complete 3D human mesh—body, feet, and hands—from a single image. It runs either fully automatically or with optional 2D keypoint and mask prompts to nudge the result, much like prompting in the SAM segmentation family. The model is built on a new parametric mesh format called Momentum Human Rig (MHR), which separates skeletal pose from surface shape.
The interesting bit The MHR representation is the quietly unusual piece: by decoupling skeleton from surface, it claims better accuracy and interpretability than the usual SMPL-style approaches. The model also supports hand-specific refinement via a dedicated hand decoder, which is rarer than it sounds in full-body HMR systems.
Key highlights
- Two released checkpoints: DINOv3-H+ (840M params) and ViT-H (631M), both hosted on Hugging Face
- Benchmark numbers provided for 3DPW, EMDB, RICH, COCO, LSPET, and Freihand (see table in README)
- Includes dataset release, live web demo, and example notebooks for inference and visualization
- Companion repo SAM 3D Objects exists; notebook provided to align both models in shared coordinates
- Licensed under the SAM License (same as Segment Anything)
Caveats
- Checkpoints require following INSTALL.md to request access; not immediately downloadable
- No training code or data generation pipeline released—this is inference-only plus datasets
- “State-of-the-art” and “superior reconstruction quality” are the authors’ claims; the qualitative comparisons shown are hand-selected samples
Verdict Worth a look if you need production-ready 3D human mesh extraction from photos, especially if promptability matters for your use case. Skip if you were hoping to train your own variant or need a fully open, no-gate model.