langfengQ/verl-agent
A reinforcement learning framework for training large language model and vision-language model agents using group-in-group policy optimization.

verl-agent extends the veRL framework to enable training LLM and VLM agents via reinforcement learning. It introduces a step-independent multi-turn rollout mechanism that allows fully customizable per-step input structures and history management, replacing traditional full-interaction concatenation. The project implements Group-in-Group Policy Optimization (GIGPO) for improved agent training, as published at NeurIPS 2025.