The 135-point stick figure generator that launched a thousand startups
CMU's real-time multi-person pose estimator detects body, face, hands, and feet simultaneously—body runtime stays flat even as the crowd grows.

What it does
OpenPose finds up to 135 keypoints on multiple humans in a single image or video stream: 25 body/foot points, 21 per hand, and 70 facial landmarks. It runs from a command-line demo, or through C++ and Python APIs if you need custom preprocessing or output formats. There’s also a 3D reconstruction module for multi-camera setups, a calibration toolbox, and a Unity plugin.
The interesting bit
The body/foot detector’s runtime is invariant to crowd size—unlike competitors whose latency scales linearly with headcount. The README shows a benchmark where OpenPose holds steady while Alpha-Pose and Mask R-CNN slow down as more people enter the frame. Hand and face detection don’t share this property; their cost grows per person, though a separate training repo offers invariant alternatives.
Key highlights
- 2D real-time multi-person keypoint detection: 15/18/25 body keypoints including 6 foot keypoints
- Hand (2×21 keypoints) and face (70 keypoints) estimation, with runtime dependent on person count
- 3D single-person reconstruction via multi-view triangulation, with FLIR/Point Grey camera sync
- Runs on Ubuntu, Windows, macOS, and NVIDIA TX2; supports CUDA, OpenCL, or CPU-only
- Portable Windows binary available—no build required
- Outputs to PNG/JPG/AVI, JSON/XML/YML, or raw arrays
Caveats
- Hand and face runtime scales with number of detected people (body does not)
- Non-commercial use only without a separate license; commercial rights via CMU’s FlintBox
- Built on Caffe; if your stack is PyTorch-native, the dependency may feel dated
Verdict
Worth a look if you need whole-body pose estimation with predictable latency for crowds, or if you’re doing multi-camera 3D capture. Skip if you need a modern, fully-commercial-friendly framework with no strings attached, or if per-person hand/face scaling is a dealbreaker for your use case.