SwinTransformer/Transformer-SSL
A PyTorch implementation of self-supervised learning pretraining with Swin Transformer backbones, evaluating learned representations on downstream computer vision tasks.

This repository provides the official implementation of self-supervised learning using vision transformers, specifically Swin Transformer. The project trains models via self-supervision to learn visual representations that transfer well to downstream tasks including object detection, instance segmentation, and semantic segmentation. It achieves competitive ImageNet-1K linear evaluation accuracy using DeiT-S/16 with 300 epoch pretraining while requiring fewer training tricks than comparable approaches like MoCo v3 and DINO.