Gen-Verse/Open-AgentRL
RLAnything is a reinforcement learning framework that jointly optimizes policy and reward models for LLMs and agents in dynamic environments.

Velocity · 7d
+2.3
★ / day
Trend
→steady
star history
The repository implements RLAnything, a closed-loop RL system that dynamically optimizes policy models using outcome and step-wise reward signals, while jointly training reward models via consistency feedback. It also includes DemyAgent, a general agentic RL agent. The framework supports terminal, GUI, SWE, and tool-call settings, supporting PPO, GRPO, and entropy-based training methods.