ViTAE-Transformer/ViTPose
Vision Transformer baseline for human pose estimation achieving 81.1 AP on MS COCO Keypoint test-dev.

Velocity · 7d
+1.4
★ / day
Trend
→steady
star history
ViTPose and ViTPose++ provide simple Vision Transformer baselines for generic body pose estimation. The models process images or video frames through a vision transformer architecture to detect and localize human keypoints such as joints and body parts. The repository includes training code, pre-trained weights, and a Huggingface Spaces demo with Gradio integration for video pose estimation.