facebookresearch/vjepa2
V-JEPA 2 is a self-supervised video encoder trained on internet-scale video data for motion understanding and action anticipation.

Velocity · 7d
+10
★ / day
Trend
→steady
star history
V-JEPA 2 is a self-supervised learning approach for training video encoders on internet-scale video data. It achieves state-of-the-art performance on motion understanding and human action anticipation tasks. The codebase includes V-JEPA 2-AC, a latent action-conditioned world model post-trained from V-JEPA 2 for robot manipulation without environment-specific data collection.