← all repositories

turningpoint-ai/VisualThinker-R1-Zero

Reinforcement learning post-training for visual reasoning that replicates DeepSeek-R1-Zero's emergent reasoning on a 2B multimodal model.

623 stars Python Language ModelsML Frameworks
VisualThinker-R1-Zero
Velocity · 7d
+1.3
★ / day
Trend
steady
star history

VisualThinker-R1-Zero applies GRPO-based reinforcement learning to train Qwen2-VL-2B on visual reasoning tasks without supervised fine-tuning or reward models. The project demonstrates emergent self-reflection and correction behaviors in visual reasoning, successfully reproducing the ‘aha moment’ and increasing response length observed in DeepSeek-R1-Zero. This enables reasoning capabilities to emerge from pure RL training on vision-centric tasks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.