openai/lm-human-preferences
Code for training reward models and fine-tuning language models from human preferences.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
This repository implements the RLHF-style training pipeline from OpenAI’s 2019 paper. It trains reward models from human preference labels, then fine-tunes language models (GPT-2) using those reward models via policy gradient methods. The code includes utilities for distributed training with Horovod and was tested on GPU clusters.