ViTAE-Transformer/ViTDet
PyTorch implementation of Vision Transformer backbones for object detection trained with Mask RCNN on COCO.

Velocity · 7d
+0.4
★ / day
Trend
→steady
star history
This repository provides an unofficial implementation of the ViTDet paper, exploring plain Vision Transformer backbones for object detection. It supports training ViT and ViTAE models with Mask RCNN on COCO dataset, reporting box and mask mAP metrics. The implementation includes configurations for multiple model sizes (Base, Small) and supports GPU and TPU training.