RUC-NLPIR/ARPO
An ICLR 2026 paper presenting Agentic Reinforced Policy Optimization, a reinforcement learning algorithm for improving language model agents.

Velocity · 7d
+3.2
★ / day
Trend
→steady
star history
ARPO is a reinforcement learning framework for optimizing agentic policies in large language models. The project includes multi-scale RL training approaches and reasoning enhancements (DeepSearch). It provides trained model checkpoints on HuggingFace (Qwen2.5, Llama3.1, Qwen3 variants) along with training datasets for supervised fine-tuning and RL reasoning stages.