PRIME-RL/TTRL
TTRL applies reinforcement learning at test time to improve large language model reasoning performance on benchmarks like AIME.

Velocity · 7d
+2.6
★ / day
Trend
→steady
star history
TTRL (Test-Time Reinforcement Learning) is a research framework for applying RL techniques to improve LLM reasoning during inference. The project implements process reward models and RL algorithms that can be executed at test time to refine LLM outputs without additional training. It builds on unsupervised RLVR techniques and is evaluated on mathematical reasoning benchmarks.