YuqingWang1029/VisTR
End-to-end video instance segmentation framework using transformer architecture.

Velocity · 7d
+0.4
★ / day
Trend
→steady
star history
VisTR implements an end-to-end approach to video instance segmentation by applying transformer architecture to jointly process video frames and predict instance masks across time. The model leverages a transformer-based detection framework (DETR) adapted for video understanding, enabling unified instance tracking and segmentation without additional post-processing. It is designed for video understanding tasks in computer vision research.