← all repositories
CMU-Perceptual-Computing-Lab/openpose

The 135-point stick figure generator that launched a thousand startups

CMU's real-time multi-person pose estimator detects body, face, hands, and feet simultaneously—body runtime stays flat even as the crowd grows.

34.1k stars C++ Computer Vision
openpose
Velocity · 7d
+10
★ / day
Trend
steady
star history

What it does

OpenPose finds up to 135 keypoints on multiple humans in a single image or video stream: 25 body/foot points, 21 per hand, and 70 facial landmarks. It runs from a command-line demo, or through C++ and Python APIs if you need custom preprocessing or output formats. There’s also a 3D reconstruction module for multi-camera setups, a calibration toolbox, and a Unity plugin.

The interesting bit

The body/foot detector’s runtime is invariant to crowd size—unlike competitors whose latency scales linearly with headcount. The README shows a benchmark where OpenPose holds steady while Alpha-Pose and Mask R-CNN slow down as more people enter the frame. Hand and face detection don’t share this property; their cost grows per person, though a separate training repo offers invariant alternatives.

Key highlights

  • 2D real-time multi-person keypoint detection: 15/18/25 body keypoints including 6 foot keypoints
  • Hand (2×21 keypoints) and face (70 keypoints) estimation, with runtime dependent on person count
  • 3D single-person reconstruction via multi-view triangulation, with FLIR/Point Grey camera sync
  • Runs on Ubuntu, Windows, macOS, and NVIDIA TX2; supports CUDA, OpenCL, or CPU-only
  • Portable Windows binary available—no build required
  • Outputs to PNG/JPG/AVI, JSON/XML/YML, or raw arrays

Caveats

  • Hand and face runtime scales with number of detected people (body does not)
  • Non-commercial use only without a separate license; commercial rights via CMU’s FlintBox
  • Built on Caffe; if your stack is PyTorch-native, the dependency may feel dated

Verdict

Worth a look if you need whole-body pose estimation with predictable latency for crowds, or if you’re doing multi-camera 3D capture. Skip if you need a modern, fully-commercial-friendly framework with no strings attached, or if per-person hand/face scaling is a dealbreaker for your use case.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.