← all repositories

Liuziyu77/Visual-RFT

Visual-RFT applies reinforcement learning fine-tuning to vision-language models using a GRPO-based framework with rule-based verifiable rewards.

2.2k stars Jupyter Notebook ML FrameworksLanguage Models
Visual-RFT
Velocity · 7d
+4.8
★ / day
Trend
steady
star history

The repository provides the first comprehensive adaptation of Deepseek-R1’s reinforcement learning strategy to the multimodal domain. It fine-tunes Qwen2-VL-2/7B models through a GRPO-based framework with rule-based verifiable rewards, enhancing performance across various visual perception tasks including open vocabulary detection, few-shot detection, and reasoning grounding.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.