← all repositories
hiyouga/EasyR1

Reinforcement learning for vision-language models, minus the infrastructure headache

A clean fork of veRL that adds multimodal support and enough algorithms to make your GPU cluster sweat.

EasyR1
Velocity · 7d
+11
★ / day
Trend
steady
star history

What it does EasyR1 trains vision-language and text-only models with reinforcement learning algorithms like GRPO, DAPO, and Reinforce++. It is a fork of ByteDance’s veRL framework, extended to handle Qwen-VL and DeepSeek-R1 distill models. The pitch is simple: run a single bash script to start RL fine-tuning on a geometry dataset, or scale to 70B+ parameters across multiple nodes with Ray.

The interesting bit The project leans heavily on veRL’s HybridEngine and vLLM’s SPMD mode for throughput, but the practical win is the pre-built Docker image and the LoRA path that drops a 7B GRPO run to a single 24GB GPU. That hardware table is doing a lot of heavy lifting for anyone budget-constrained.

Key highlights

  • Supports GRPO, DAPO, Reinforce++, ReMax, RLOO, GSPO, and CISPO
  • Vision-language models: Qwen2-VL, Qwen2.5-VL, Qwen3-VL; text models: Llama3, Qwen2/2.5/3, DeepSeek-R1 distill
  • LoRA training and padding-free training for memory efficiency
  • Pre-built Docker images with CUDA 12.9 and vLLM 0.11.0
  • Multi-node scaling via Ray with documented setup steps
  • Checkpoint resuming and multiple logger backends (Wandb, SwanLab, MLflow, TensorBoard)

Caveats

  • Hardware requirements escalate quickly: full fine-tuning a 72B model needs 32×80GB GPUs in AMP mode
  • The README notes estimated hardware specs, so your mileage may vary
  • ModelScope hub fallback suggests Hugging Face connectivity can be flaky in some regions

Verdict Grab this if you are already doing RL on language models and want to add images without rebuilding your stack from scratch. Skip it if you are looking for a lightweight research prototype — this is a production-oriented fork with the infrastructure complexity to match.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.