← all repositories

ViTAE-Transformer/ViTDet

PyTorch implementation of Vision Transformer backbones for object detection trained with Mask RCNN on COCO.

584 stars Python Computer VisionML Frameworks
ViTDet
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

This repository provides an unofficial implementation of the ViTDet paper, exploring plain Vision Transformer backbones for object detection. It supports training ViT and ViTAE models with Mask RCNN on COCO dataset, reporting box and mask mAP metrics. The implementation includes configurations for multiple model sizes (Base, Small) and supports GPU and TPU training.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.