Liuziyu77/Visual-RFT
Visual-RFT applies reinforcement learning fine-tuning to vision-language models using a GRPO-based framework with rule-based verifiable rewards.

Velocity · 7d
+4.8
★ / day
Trend
→steady
star history
The repository provides the first comprehensive adaptation of Deepseek-R1’s reinforcement learning strategy to the multimodal domain. It fine-tunes Qwen2-VL-2/7B models through a GRPO-based framework with rule-based verifiable rewards, enhancing performance across various visual perception tasks including open vocabulary detection, few-shot detection, and reasoning grounding.