OpenLMLab/MOSS-RLHF
Research framework for training and aligning large language models using Reinforcement Learning from Human Feedback (RLHF) with PPO.

MOSS-RLHF is a research project focused on RLHF techniques for aligning large language models. Part I covers Proximal Policy Optimization (PPO) implementation for LLM fine-tuning, while Part II addresses reward modeling. The project provides code for training reward models and has released annotated datasets including a cleaned hh-rlhf dataset. It won the best paper award at NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following.