← all repositories

PRIME-RL/TTRL

TTRL applies reinforcement learning at test time to improve large language model reasoning performance on benchmarks like AIME.

TTRL
Velocity · 7d
+2.6
★ / day
Trend
steady
star history

TTRL (Test-Time Reinforcement Learning) is a research framework for applying RL techniques to improve LLM reasoning during inference. The project implements process reward models and RL algorithms that can be executed at test time to refine LLM outputs without additional training. It builds on unsupervised RLVR techniques and is evaluated on mathematical reasoning benchmarks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.