THU-MIG/RepViT
RepViT is a mobile-optimized vision architecture that incorporates ViT designs into CNNs, while RepViT-SAM adapts the Segment Anything Model for real-time mobile segmentation.

This repository provides the official PyTorch implementation of two CVPR 2024 papers: RepViT and RepViT-SAM. RepViT revisits mobile CNN architectures from a ViT perspective to achieve better latency-performance trade-offs on mobile devices. RepViT-SAM replaces SAM’s heavyweight image encoder with RepViT to enable real-time segmenting anything on resource-constrained devices, achieving nearly 10x faster inference than MobileSAM while maintaining strong zero-shot transfer capabilities.