Reinforcement learning for agents that actually ships
rLLM lets you train any LLM agent with RL by swapping the client and adding a decorator—no framework lock-in required.

What it does
rLLM is a training framework that wraps existing agent code in reinforcement learning without rewriting it. You keep your LangGraph, SmolAgent, or plain OpenAI client; rLLM intercepts LLM calls through a gateway, logs token IDs and logprobs, and feeds the traces to GRPO, REINFORCE, or RLOO. A CLI handles eval and training on 50+ benchmarks (rllm eval gsm8k), or you can use the Python API for custom rollouts and reward functions.
The interesting bit
The gateway trick is the quiet workhorse: during training, config.base_url silently routes to rLLM’s capture server, so your agent code stays identical between evaluation and training. Same binary, different destination. They also split backends—tinker for single-machine or CPU, verl for distributed GPU—behind one API, which is rarer than it should be in this space.
Key highlights
- Decorator-based tracing:
@rllm.rolloutwraps any agent function - Framework-agnostic: LangGraph, OpenAI Agents SDK, Google ADK, or raw HTTP
- Built-in benchmarks: 50+ datasets runnable from CLI without writing code
- Two training backends:
tinker(local/CPU) andverl(multi-GPU distributed) - Claimed results: 4B model outperforming 235B on finance tasks; 1.5B surpassing O1-Preview on math (from project blog posts)
- Python 3.11+; install via pip from GitHub, PyPI release pending
Caveats
- No PyPI package yet; install from GitHub URL only
- GPU backend requires separate
[verl]dependency set - Several blog post dates (e.g., “Mar 2026”) appear to be typos or future-dated; verify current status
Verdict Worth a look if you’re already running agents and want to experiment with RL fine-tuning without porting to a new framework. Skip if you need a mature, pip-installable package today or aren’t ready to write your own reward functions.