sail-sg/metaformer
A PyTorch implementation of MetaFormer baselines for image classification on ImageNet-1K, including CNN and transformer-based vision models.

This repository provides PyTorch implementations of MetaFormer baselines for vision tasks, specifically IdentityFormer, RandFormer, ConvFormer and CAFormer. The models adopt a hierarchical 4-stage architecture similar to ResNet, with various token mixer designs. ConvFormer outperforms ConvNeXt without novel token mixers, while CAFormer achieves 85.5% accuracy on ImageNet-1K at 224x224 resolution under normal supervised training without external data or distillation.