Is Agent-R1 open source?

Yes — AgentR1/Agent-R1 is open source, released under the MIT license.

What language is Agent-R1 written in?

AgentR1/Agent-R1 is primarily written in Python.

How popular is Agent-R1?

AgentR1/Agent-R1 has 1.6k stars on GitHub.

Where can I find Agent-R1?

AgentR1/Agent-R1 is on GitHub at https://github.com/AgentR1/Agent-R1.

← all repositories

AgentR1/Agent-R1

Agent-R1 turns multi-turn LLM tool use into a proper RL feedback loop

It trains multi-step LLM agents by treating every turn as a step-level MDP transition instead of a single growing prompt-response sequence.

★1.6k stars Python Agents ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Agent-R1 is a modular training framework for agentic reinforcement learning. It reconnects LLM serving infrastructure with distributed training stacks into a continuous loop of rollout, reward, replay, and update for agents that call tools and read environment feedback across many turns. Instead of asking you to rewrite the RL pipeline for every new task, it offers layered abstractions—from a generic environment loop down to individual tool interfaces—that let you plug in new tasks while keeping the trainer intact.

The interesting bit

The framework treats every agent turn as a step-level Markov decision process transition, explicitly storing observation, action, environment feedback, reward, and next observation. This sidesteps the usual kludge of flattening a multi-turn conversation into a single token sequence and hoping credit assignment sorts itself out.

Key highlights

Step-native trajectories keep action boundaries clean and avoid fragile Token -> Text -> Token reconstruction.
Layered architecture (AgentFlowBase, AgentEnvLoop, ToolEnv, BaseTool) separates task logic from the training algorithm.
Built atop the verl stack and designed to work with existing serving systems like vLLM and SGLang.
Published experiments benchmark Qwen3-4B across GSM8K, HotpotQA, ALFWorld, and WebShop with GRPO, PPO, REINFORCE++, and RLOO.
Already spawning spin-offs such as PaperScout (academic search) and Cast-R1 (time-series forecasting).

Caveats

Tightly coupled to verl==0.7.0 and a recent source checkout with AgentFlow/async rollout support, so dependency alignment is non-trivial.
The project just completed a full refactor to v0.1.0; older code lives on a legacy branch, which suggests the API surface may still be shifting.

Verdict

Reach for this if you are training LLM agents that interact with tools or environments over multiple turns and you want a principled RL substrate rather than prompt-engineering duct tape. Look elsewhere if you need a polished, batteries-included product without engineering overhead.

Frequently asked

What is AgentR1/Agent-R1?: It trains multi-step LLM agents by treating every turn as a step-level MDP transition instead of a single growing prompt-response sequence.
Is Agent-R1 open source?: Yes — AgentR1/Agent-R1 is open source, released under the MIT license.
What language is Agent-R1 written in?: AgentR1/Agent-R1 is primarily written in Python.
How popular is Agent-R1?: AgentR1/Agent-R1 has 1.6k stars on GitHub.
Where can I find Agent-R1?: AgentR1/Agent-R1 is on GitHub at https://github.com/AgentR1/Agent-R1.