natolambert/rlhf-book
An open-source textbook on Reinforcement Learning from Human Feedback covering post-training language model techniques.

Velocity · 7d
+2.6
★ / day
Trend
→steady
star history
This repository contains the source material for a comprehensive textbook on RLHF, documenting techniques used to align and improve language models after initial pre-training. The book covers rejection sampling, preference modeling, reward modeling, and character training methodologies. It serves as an educational reference for practitioners working at the frontier of open language model development.