AgentR1/Agent-R1
Agent-R1 is a unified RL framework for training multi-step LLM agents to use tools and interact with environments.

The framework implements a step-native reinforcement learning loop where LLM agents observe environments, generate actions, and receive tool or environment feedback until task completion. It models each turn as an explicit MDP transition, making reward assignment and policy optimization part of a unified training substrate. The project includes integrations with StepPO training and provides processed datasets for agent tasks like HotpotQA, ALFWorld, and WebShop.