google-research/maxvit
Multi-axis vision transformer model for image classification, detection, and segmentation tasks.

Velocity · 7d
+0.3
★ / day
Trend
→steady
star history
This is the official TensorFlow implementation of MaxViT, a multi-axis vision transformer published at ECCV 2022. It provides state-of-the-art foundation models for image classification, object detection, semantic segmentation, image quality assessment, and generative modeling tasks. The architecture combines dilated local attention with grid attention across both spatial axes.