Sense-X/UniFormer
A unified transformer model for efficient spatiotemporal visual representation learning across classification, detection, segmentation, and pose estimation.

UniFormer is the official implementation of papers published at ICLR2022 and TPAMI2023, proposing a unified architecture that bridges local convolution and global self-attention for visual recognition. It provides pretrained models and training code for multiple vision tasks including image classification, video classification, object detection, semantic segmentation, and pose estimation, with both standard and lightweight model variants.