OpenGVLab/InternImage
InternImage is a vision foundation model architecture using deformable convolutions that achieves state-of-the-art results on object detection and semantic segmentation benchmarks.

InternImage is a large-scale vision foundation model that adapts deformable convolutions for scalable deep learning on visual tasks. It serves as a general-purpose backbone for detection and segmentation tasks including COCO object detection, LVIS, and Pascal VOC. The model was highlighted at CVPR 2023 and demonstrates competitive performance against transformer-based vision models on multiple benchmarks.