facebookresearch/ijepa
PyTorch implementation of I-JEPA, a self-supervised learning method that predicts image region representations from other regions without pixel-level reconstruction.

I-JEPA is a self-supervised learning method for images that predicts high-level semantic representations of image patches from encoded representations of other patches in the same image. The model uses a transformer-based predictor to make predictions in latent space rather than pixel space, learning semantic features without relying on hand-crafted data transformations. The official codebase includes training scripts, model architectures, and evaluation utilities for the CVPR-23 paper.