Is Open-AgentRL open source?

Yes — Gen-Verse/Open-AgentRL is open source, released under the Apache-2.0 license.

What language is Open-AgentRL written in?

Gen-Verse/Open-AgentRL is primarily written in Python.

How popular is Open-AgentRL?

Gen-Verse/Open-AgentRL has 589 stars on GitHub.

Where can I find Open-AgentRL?

Gen-Verse/Open-AgentRL is on GitHub at https://github.com/Gen-Verse/Open-AgentRL.

← all repositories

Gen-Verse/Open-AgentRL

Open-source RL that closes the loop between policy, reward, and world

It open-sources the full training stack for two agentic RL tracks, bundling closed-loop optimization and the recipes that let a 4B model outperform 32B rivals.

★589 stars Python Agents Language Models LLMOps · Eval

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Open-AgentRL houses the training code, datasets, and model checkpoints for two agentic RL research tracks. RLAnything treats the policy, reward model, and environment as a closed-loop system, training each component with feedback from the others rather than freezing any one part. DemyAgent distills what actually matters in agentic RL—real trajectories, exploration-friendly tweaks like reward clipping, and deliberate tool use—and packages it into datasets and a 4B model that the authors report beats 32B rivals on math, coding, and reasoning benchmarks.

The interesting bit

Most RL pipelines treat the reward model or environment as static scaffolding; RLAnything turns them into co-learners that adapt via critic feedback. Meanwhile, DemyAgent’s results suggest that data curation and training hygiene matter more than raw parameter count—an encouraging sign for anyone without a warehouse of GPUs.

Key highlights

RLAnything jointly optimizes policy, reward model, and environment, using step-wise reward signals that the authors say outperform outcome-only human labels.
DemyAgent contributes a 3K-sample SFT dataset, a 30K-sample RL dataset, and a 4B model reported to surpass 32B baselines on AIME, GPQA-Diamond, and LiveCodeBench.
Training code and evaluation scripts are released for both tracks, covering GUI agents, text-based games, coding LLMs, and general agentic reasoning.
Pre-trained policy checkpoints (7B/8B) and reward model checkpoints (8B/14B) are available for RLAnything; SFT and RL-tuned checkpoints are available for DemyAgent.
The repository also anchors OpenClaw-RL, a fully asynchronous, self-hosted RL framework for personalized agentic AI built on top of this stack.

Caveats

Reproducing DemyAgent’s agentic RL training requires SandboxFusion for code execution, meaning you need either local VM infrastructure or a Volcano Engine cloud endpoint.
The published training runs used an 8× A100 node, so replicating the full results is not laptop-friendly.

Verdict

Worth a look if you are building or studying agentic LLMs and want a reproducible starting point beyond standard PPO/GRPO scripts. Skip it if you need a turnkey, batteries-included framework with managed execution environments.

Frequently asked

What is Gen-Verse/Open-AgentRL?: It open-sources the full training stack for two agentic RL tracks, bundling closed-loop optimization and the recipes that let a 4B model outperform 32B rivals.
Is Open-AgentRL open source?: Yes — Gen-Verse/Open-AgentRL is open source, released under the Apache-2.0 license.
What language is Open-AgentRL written in?: Gen-Verse/Open-AgentRL is primarily written in Python.
How popular is Open-AgentRL?: Gen-Verse/Open-AgentRL has 589 stars on GitHub.
Where can I find Open-AgentRL?: Gen-Verse/Open-AgentRL is on GitHub at https://github.com/Gen-Verse/Open-AgentRL.