IDEA-Research/MaskDINO
A unified transformer-based framework for object detection, instance segmentation, semantic segmentation, and panoptic segmentation tasks.

MaskDINO extends the DINO detector with a mask prediction head to handle both detection and segmentation tasks within a single transformer architecture. The model unifies multiple computer vision tasks including instance segmentation, semantic segmentation, and panoptic segmentation through a shared framework. It achieves state-of-the-art results on COCO and ADE20K benchmarks. The implementation includes training scripts, evaluation tools, and pretrained models for downstream use.