Yes — RUC-NLPIR/ARPO is an open-source project tracked on heatdrop.

What language is ARPO written in?

RUC-NLPIR/ARPO is primarily written in Python.

RUC-NLPIR/ARPO has 1.1k stars on GitHub.

Where can I find ARPO?

RUC-NLPIR/ARPO is on GitHub at https://github.com/RUC-NLPIR/ARPO.

RUC-NLPIR/ARPO

Entropy-aware RL for tool-wielding LLMs

ARPO is a reinforcement-learning codebase that trains LLMs to act as multi-tool agents, branching exploration or rebalancing entropy when tool-call uncertainty is high.

★1.1k stars Python Agents Language Models ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

ARPO and its follow-up AEPO are reinforcement learning algorithms for training LLMs to act as agents that invoke external tools. The repository includes training code, datasets, and model checkpoints for both algorithms, with support for Qwen and Llama families. Essentially, it is a low-level RL training suite aimed at making language models better at deciding when to call a tool instead of guessing.

The interesting bit

Both algorithms treat uncertainty as a signal rather than noise. ARPO adaptively branches sampling during high-entropy tool-call rounds, letting the model explore multiple strategies when it is least sure what to do, while AEPO balances entropy across both rollout generation and policy updates. The authors credit this entropy-aware design with pushing scores upward on agentic benchmarks like GAIA and HLE.

Key highlights

Two algorithms in one repo: ARPO (accepted at ICLR 2026) and AEPO (WWW 2026 Oral). The README states AEPO consistently outperforms ARPO on GAIA, HLE, and AIME.
Released checkpoints span 3B to 32B parameters, including QwQ-32B variants that score 53.4/12.8 (AEPO) and 51.5/11.2 (ARPO) on the GAIA/HLE benchmarks reported by the authors.
Training stack includes tool-call acceleration and a dynamic cache that stores tool results in real time; the authors claim a Qwen3-14B run on one node with batch size 128 takes roughly ten minutes per step.
Supports multi-tool agentic RL training for Qwen2.5, Qwen3, and Llama3 models.

Caveats

The README is rich in release announcements and social links but sparse on architectural detail; expect to read the papers to understand the branching and entropy-balancing mechanics.
Benchmark numbers are author-reported without visible baselines or error bars in the README, so the GAIA/HLE scores should be taken as self-reported results.

Verdict

A solid starting point if you are researching tool-use RL for LLMs and want a training pipeline with published checkpoints. Less useful if you are looking for a plug-and-play agent framework rather than a research training codebase.

Frequently asked

What is RUC-NLPIR/ARPO?: ARPO is a reinforcement-learning codebase that trains LLMs to act as multi-tool agents, branching exploration or rebalancing entropy when tool-call uncertainty is high.
Is ARPO open source?: Yes — RUC-NLPIR/ARPO is an open-source project tracked on heatdrop.
What language is ARPO written in?: RUC-NLPIR/ARPO is primarily written in Python.
How popular is ARPO?: RUC-NLPIR/ARPO has 1.1k stars on GitHub.
Where can I find ARPO?: RUC-NLPIR/ARPO is on GitHub at https://github.com/RUC-NLPIR/ARPO.