PRIME-RL/PRIME
A scalable reinforcement learning framework for training language models to reason more effectively using implicit process rewards.

Velocity · 7d
+3.6
★ / day
Trend
→steady
star history
PRIME implements process reinforcement learning to improve LLM reasoning by generating implicit reward signals during multi-step reasoning tasks. The approach focuses on scalable RL training for language models, integrated with frameworks like veRL. It includes training recipes, evaluation benchmarks, and model weights on Hugging Face for reproducing the method.