← all repositories
rllm-org/rllm

Reinforcement learning for agents that actually ships

rLLM lets you train any LLM agent with RL by swapping the client and adding a decorator—no framework lock-in required.

rllm
Velocity · 7d
+11
★ / day
Trend
steady
star history

What it does rLLM is a training framework that wraps existing agent code in reinforcement learning without rewriting it. You keep your LangGraph, SmolAgent, or plain OpenAI client; rLLM intercepts LLM calls through a gateway, logs token IDs and logprobs, and feeds the traces to GRPO, REINFORCE, or RLOO. A CLI handles eval and training on 50+ benchmarks (rllm eval gsm8k), or you can use the Python API for custom rollouts and reward functions.

The interesting bit The gateway trick is the quiet workhorse: during training, config.base_url silently routes to rLLM’s capture server, so your agent code stays identical between evaluation and training. Same binary, different destination. They also split backends—tinker for single-machine or CPU, verl for distributed GPU—behind one API, which is rarer than it should be in this space.

Key highlights

  • Decorator-based tracing: @rllm.rollout wraps any agent function
  • Framework-agnostic: LangGraph, OpenAI Agents SDK, Google ADK, or raw HTTP
  • Built-in benchmarks: 50+ datasets runnable from CLI without writing code
  • Two training backends: tinker (local/CPU) and verl (multi-GPU distributed)
  • Claimed results: 4B model outperforming 235B on finance tasks; 1.5B surpassing O1-Preview on math (from project blog posts)
  • Python 3.11+; install via pip from GitHub, PyPI release pending

Caveats

  • No PyPI package yet; install from GitHub URL only
  • GPU backend requires separate [verl] dependency set
  • Several blog post dates (e.g., “Mar 2026”) appear to be typos or future-dated; verify current status

Verdict Worth a look if you’re already running agents and want to experiment with RL fine-tuning without porting to a new framework. Skip if you need a mature, pip-installable package today or aren’t ready to write your own reward functions.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.