3D pose from a single photo, no mocap suit required
MeTRAbs turns an RGB image into metric-scale 3D human poses, handling partial bodies and lens distortion without breaking stride.

What it does
MeTRAbs estimates absolute 3D human poses from ordinary RGB images—no depth sensor, no calibrated multi-camera rig. Feed it a photo or video frame and it returns 2D keypoints, 3D poses in camera space, and optionally 3D world coordinates if you provide camera calibration. The models run as standalone TensorFlow SavedModels, so one tfhub.load() call gets you inference without dragging in the entire training codebase.
The interesting bit
The “truncation-robust” part matters: the model doesn’t fall apart when limbs are cropped or partially out of frame, a common failure mode in pose estimators. It also undoes radial/tangential lens distortion on the GPU and applies gamma-correct rescaling—details that usually get hand-waved but directly affect accuracy on real-world footage.
Key highlights
- Single-line inference via TensorFlow Hub; experimental PyTorch support added in 2023
- Multiple skeleton formats (COCO, SMPL, H36M) selectable at runtime
- Built-in test-time augmentation and plausibility filtering to suppress weird poses
- Backbone options from EfficientNetV2 (accurate) to MobileNetV3 (fast)
- Won the 3DPW Challenge; code and models upgraded to TensorFlow 2 with ongoing maintenance
Caveats
- Models are non-commercial only due to training dataset licenses
- The PyTorch port is labeled “experimental” in the README
- Multi-dataset training details and full evaluation scripts require digging into the
docs/directory
Verdict
Worth a look if you need 3D pose from monocular video without building a capture studio. Skip it if your use case is commercial or if you need real-time performance guarantees—the speed depends heavily on which backbone you choose.