Is Seg-Zero open source?

Yes — JIA-Lab-research/Seg-Zero is open source, released under the Apache-2.0 license.

What language is Seg-Zero written in?

JIA-Lab-research/Seg-Zero is primarily written in Python.

How popular is Seg-Zero?

JIA-Lab-research/Seg-Zero has 634 stars on GitHub.

Where can I find Seg-Zero?

JIA-Lab-research/Seg-Zero is on GitHub at https://github.com/JIA-Lab-research/Seg-Zero.

← all repositories

JIA-Lab-research/Seg-Zero

Teaching vision models to think before they segment

Seg-Zero trains vision models to narrate their thought process before generating a segmentation mask, using reinforcement learning instead of supervised reasoning data.

★634 stars Python Image · Video · Audio Language Models Agents

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does Seg-Zero is a training framework and set of pretrained checkpoints for visual segmentation that requires the model to write out a text reasoning chain before it produces the final mask. It is built on Qwen2-VL and Qwen2.5-VL backbones and uses GRPO—group relative policy optimization—to train entirely via reinforcement signals such as IoU and L1 rewards, without any explicit supervised reasoning annotations. The repository also hosts the official training code for VisionReasoner, a follow-up project that extends the same RL pipeline to multi-object and multi-task scenarios.

The interesting bit The project bets that you can elicit step-by-step visual reasoning purely through RL rewards—no chain-of-thought labels required—and pairs a reasoning module with a segmentation head in a decoupled architecture.

Key highlights

Trains exclusively with RL using GRPO; the authors claim this elicits reasoning chains without any supervised reasoning annotations.
Decoupled architecture: a reasoning model generates the thought process, then a segmentation model produces the mask.
Supports both Qwen2-VL and Qwen2.5-VL backbones, and now handles multi-object scenes via the VisionReasoner extension.
Training code is based on EasyR1 and veRL, using model-split sampling to keep GPU memory usage in check.
Includes off-the-shelf accuracy rewards (IoU, L1) and format rewards for object-detection and segmentation tasks.

Caveats

The May update that introduced multi-object support is described by the authors as a “risky” breaking change; rolling back to the earlier single-object code path requires checking out a specific prior commit.
Training a 7B model demands serious hardware: the authors recommend either four 80 GB GPUs or eight 46 GB GPUs.
Reproducibility is slightly murky: the README notes that Seg-Zero’s best benchmark numbers were drawn from different checkpoints, whereas VisionReasoner uses a single checkpoint.

Verdict Worth a look if you are researching RL-driven multimodal models or need a starting point for reward engineering in vision-language tasks. Skip it if you are after a lightweight, plug-and-play segmentation tool—this is research training code with heavy compute requirements.

Frequently asked

What is JIA-Lab-research/Seg-Zero?: Seg-Zero trains vision models to narrate their thought process before generating a segmentation mask, using reinforcement learning instead of supervised reasoning data.
Is Seg-Zero open source?: Yes — JIA-Lab-research/Seg-Zero is open source, released under the Apache-2.0 license.
What language is Seg-Zero written in?: JIA-Lab-research/Seg-Zero is primarily written in Python.
How popular is Seg-Zero?: JIA-Lab-research/Seg-Zero has 634 stars on GitHub.
Where can I find Seg-Zero?: JIA-Lab-research/Seg-Zero is on GitHub at https://github.com/JIA-Lab-research/Seg-Zero.