Walter0807/MotionBERT
A PyTorch implementation of a pretraining framework for learning unified human motion representations from video for 3D pose estimation, mesh recovery, and action recognition.

MotionBERT is a computer vision research project that learns holistic 3D human motion representations from raw video through a transformer-based pretraining approach. It unifies multiple downstream tasks including monocular 3D pose estimation, mesh recovery, and skeleton-based action recognition under a single representation. The method is pretrained on large-scale motion data using masked motion modeling (similar to BERT’s masked language modeling), then fine-tuned for specific tasks achieving state-of-the-art results on benchmarks like Human3.6M and NTU-RGB+D.