3D pose from a webcam without a Kinect
A PyTorch demo that lifts 2D body keypoints into 3D space in real time, with escape hatches for CPU (OpenVINO) and Jetson (TensorRT).

What it does
Takes a monocular video stream—webcam or file—and outputs 3D coordinates for up to 18 body keypoints per person. It is explicitly a demo, not a library: the README notes it overlaps heavily with OpenCV’s model zoo but strips down to just the 3D pose estimation plumbing. You will need to build a C++ pose_extractor module, fetch a pre-trained checkpoint from Google Drive, and probably fuss with camera extrinsics and focal length if you want geometrically correct output.
The interesting bit
The 3D estimation runs off 2D detections without depth sensors, trained on MS COCO plus CMU Panoptic. The claimed 100 mm MPJPE on a CMU Panoptic subset is specific enough to be useful. More practically, the project offers two well-documented off-ramps from stock PyTorch: OpenVINO for CPU inference and TensorRT for Jetson deployment, with the author reporting ~10x speedup on an RTX 2060 via TensorRT versus PyTorch 1.6.0+cu101.
Key highlights
- Detects 18 keypoint types: ears, eyes, nose, neck, shoulders, elbows, wrists, hips, knees, ankles
- Supports OpenVINO and TensorRT backends with conversion scripts included
- Requires Python 3.5+, CMake 3.10+, OpenCV 4.0+, and a C++ compiler
- Camera parameters (
--extrinsics,--fx) needed for correct scene visualization; defaults provided if omitted - TensorRT conversion requires fixed input dimensions; dynamic reshape is not supported
Caveats
- Pre-trained model lives on Google Drive, not in the repo or a package registry
- Build step (
setup.py build_ext) andPYTHONPATHmanipulation required before running - TensorRT path involves CUDA 11.1, cuDNN 8, and
torch2trt; the README says “these steps work for me” with a shrug
Verdict
Worth a look if you need a working baseline for 3D multi-person pose on commodity hardware and can tolerate demo-grade packaging. Skip if you want a pip-installable library or need production-ready model distribution.