← all repositories

Alpha-VL/ConvMAE

MCMAE is a masked autoencoder vision backbone combining convolutions with transformer architecture for image classification, detection, and segmentation.

528 stars Python Computer VisionML Frameworks
ConvMAE
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

MCMAE (Masked Convolution Meets Masked Autoencoders) is a computer vision model that combines masked autoencoder pretraining with multi-scale convolutions. It provides pretrained backbone models that can be finetuned for downstream tasks including ImageNet classification, object detection (with Mask R-CNN), semantic segmentation, and video classification. The approach accelerates training and improves transfer learning performance compared to vanilla MAE by integrating hierarchical representations.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.