JIA-Lab-research/Seg-Zero
Seg-Zero is a vision-language model that generates reasoning chains before producing segmentation masks, trained entirely via reinforcement learning without supervised reasoning data.

This repository implements Seg-Zero and VisionReasoner, research projects that train vision-language models for unified visual perception and reasoning. The models generate step-by-step reasoning chains before producing final segmentation outputs. Training relies exclusively on reinforcement learning, enabling emergent test-time reasoning without explicit supervised reasoning data. It supports Qwen2-VL and Qwen2.5-VL model series.