EvolvingLMMs-Lab/open-r1-multimodal
A fork of open-r1 that adds multimodal RL training support for vision-language models using the GRPO algorithm.

Velocity · 7d
+3.1
★ / day
Trend
→steady
star history
This repository extends the open-r1 project to support multimodal reasoning model training. It implements the GRPO (Group Relative Policy Optimization) algorithm for training vision-language models like Qwen2-VL and Aria-MoE on math reasoning tasks. The project provides open-sourced training datasets with reasoning paths and verifiable answers, trained model checkpoints, and scripts for creating custom multimodal reasoning data.