← all repositories

hustvl/YOLOS

A vision transformer model adapted for object detection without task-specific architectural modifications, published at NeurIPS 2021.

903 stars Jupyter Notebook Computer VisionInference · Serving
YOLOS
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

YOLOS demonstrates that vanilla Vision Transformers pre-trained on image classification can transfer to object detection by adding detection tokens and using a set-based Hungarian matching loss. The project studies the transferability of ImageNet-pretrained ViTs to the COCO detection benchmark, including experiments with self-supervised MoCo-v3 pre-training. The implementation is integrated into HuggingFace Transformers for easy use.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.