← all repositories

PRIME-RL/PRIME

A scalable reinforcement learning framework for training language models to reason more effectively using implicit process rewards.

PRIME
Velocity · 7d
+3.6
★ / day
Trend
steady
star history

PRIME implements process reinforcement learning to improve LLM reasoning by generating implicit reward signals during multi-step reasoning tasks. The approach focuses on scalable RL training for language models, integrated with frameworks like veRL. It includes training recipes, evaluation benchmarks, and model weights on Hugging Face for reproducing the method.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.