whai362/PVT
A Pyramid Vision Transformer implementation providing backbone models for image classification, object detection, and semantic segmentation.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
This repository contains the official implementation of PVTv1 and PVTv2, transformer-based architectures designed as drop-in backbones for various vision tasks. The models achieve strong results on ImageNet-1K classification, COCO object detection, and semantic segmentation benchmarks. PVTv2 improves upon the original PVT and compares favorably to alternatives like Swin Transformer.