RLHFlow/RLHF-Reward-Modeling
A collection of training recipes for reward models used in RLHF-based LLM alignment, including Bradley-Terry, pairwise, multi-objective, and process-supervised approaches.

The repository provides implementations of various reward modeling techniques for training LLMs via Reinforcement Learning from Human Feedback (RLHF). It includes classic Bradley-Terry reward modeling, pairwise preference models that predict response preference from prompt-response pairs, multi-objective reward models using mixture-of-experts aggregation, and process-supervised reward models for mathematical reasoning. Each approach includes code, data, hyperparameters, and references to associated research papers.