FoundationVision/VNext
A video instance recognition framework built on Detectron2 implementing state-of-the-art computer vision models for video segmentation and object tracking.

VNext is a next-generation video instance recognition framework that provides advanced online and offline video instance segmentation algorithms along with motion models for object-centric video tasks. It officially implements multiple award-winning CVPR/ECCV papers including InstMove, SeqFormer, and IDOL, with IDOL winning first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge. The framework supports transformer-based architectures for video understanding tasks.