← all repositories

microsoft/SimMIM

Microsoft's official implementation of SimMIM, a self-supervised framework for pre-training vision transformers via masked image modeling.

SimMIM
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

SimMIM provides a simple framework for masked image modeling to pre-train vision transformers. The approach uses random masking with a moderately large patch size (e.g., 32) and predicts raw pixel RGB values through direct regression. The framework supports pre-training and fine-tuning on ImageNet-1K with Swin Transformer and ViT models, achieving strong representation learning performance without complex prediction head designs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.