← all repositories
areal-project/AReaL

Async RL for LLM agents, minus the orchestration headache

AReaL lets you train reasoning and agentic models by swapping a base_url instead of rewriting pipelines.

5.3k stars Python AgentsML Frameworks
AReaL
Velocity · 7d
+11
★ / day
Trend
steady
star history

What it does AReaL is a reinforcement-learning training system built for LLM-based agents and reasoning models. It handles the distributed infrastructure—data loading, rollout generation, reward computation, and policy updates—so you can point it at an agent endpoint and train. The project comes from Tsinghua IIIS and Ant Group’s AReaL Team.

The interesting bit The fully asynchronous design is the core pitch: training runs without waiting for all workers to synchronize, which the authors claim yields a 2.77× speedup over synchronous setups while keeping model quality intact. A lighter variant, AReaL-lite, strips out roughly 80% of the code and keeps about 90% of the performance for researchers who want to iterate on algorithms rather than infrastructure.

Key highlights

  • Swap base_url and api_key to train against black-box agents (OpenAI Agents SDK, CAMEL-AI, or your own) without pipeline rewrites.
  • Supports GRPO, PPO, DAPO, REINFORCE, RLOO, LitePPO, DR-GRPO, GSPO, and more out of the box.
  • Single-node and Ray-cluster modes; Ascend NPU support lives in a dedicated branch.
  • Published results for math, coding, search, and customer-service agents, including a 235B MoE model trained with an attached data-synthesis engine.
  • Integrates with NVIDIA Scaffoldings for modular agent-execution decoupling.

Caveats

  • The README mentions “industry-leading speed” and “state-of-the-art” results but does not provide raw throughput numbers or independent benchmarks for direct comparison.
  • Vision-language model examples are listed but truncated in the source; details are unclear.
  • NPU support is maintained on a separate branch, not the default trunk.

Verdict Worth a look if you are building multi-turn agents or reasoning models and would rather tune rewards than debug distributed RL plumbing. Probably overkill if you just need to fine-tune a model with supervised learning.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.