← all repositories

RUC-NLPIR/ARPO

An ICLR 2026 paper presenting Agentic Reinforced Policy Optimization, a reinforcement learning algorithm for improving language model agents.

ARPO
Velocity · 7d
+3.2
★ / day
Trend
steady
star history

ARPO is a reinforcement learning framework for optimizing agentic policies in large language models. The project includes multi-scale RL training approaches and reasoning enhancements (DeepSearch). It provides trained model checkpoints on HuggingFace (Qwen2.5, Llama3.1, Qwen3 variants) along with training datasets for supervised fine-tuning and RL reasoning stages.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.