fudan-zvg/SETR
A semantic segmentation model that rethinks segmentation as a sequence-to-sequence problem using transformer encoder architecture.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
SETR applies Vision Transformers to semantic segmentation tasks by treating image patches as sequences and using a transformer encoder for dense prediction. The project provides model implementations and configurations for Cityscapes and other segmentation benchmarks, including SETR-Naive and SETR-MLA variants with pretrained weights.