← all repositories

ViTAE-Transformer/ViTPose

Vision Transformer baseline for human pose estimation achieving 81.1 AP on MS COCO Keypoint test-dev.

2.1k stars Python Computer Vision
ViTPose
Velocity · 7d
+1.4
★ / day
Trend
steady
star history

ViTPose and ViTPose++ provide simple Vision Transformer baselines for generic body pose estimation. The models process images or video frames through a vision transformer architecture to detect and localize human keypoints such as joints and body parts. The repository includes training code, pre-trained weights, and a Huggingface Spaces demo with Gradio integration for video pose estimation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.