naver-ai/rope-vit
A research implementation applying rotary position embeddings (RoPE) to vision transformers for improved image classification and detection performance.

This repository provides the official PyTorch implementation of RoPE-ViT, a research paper from NAVER AI Lab published at ECCV 2024. The work applies Rotary Position Embedding (RoPE), originally successful in language models for length extrapolation, to Vision Transformers. The implementation enables improved resolution extrapolation at inference time while maintaining accuracy, demonstrating gains across multiple computer vision benchmarks including ImageNet-1k classification, COCO object detection, and ADE-20k semantic segmentation.