MCG-NJU/VideoMAE
Official PyTorch implementation of VideoMAE, a masked autoencoder for self-supervised video representation learning.

Velocity · 7d
+1.1
★ / day
Trend
→steady
star history
VideoMAE is a NeurIPS 2022 Spotlight paper implementing masked autoencoders for data-efficient self-supervised video pre-training. The repository provides PyTorch training code and pretrained models for video understanding tasks including action recognition on Kinetics-400, Something-Something, and UCF-101 datasets. It uses vision transformers (ViT) adapted for video data with masked token reconstruction as the pretraining objective.