tatsu-lab/alpaca_farm
A simulation framework for developing and benchmarking RLHF methods without collecting real human feedback data.

Velocity · 7d
+0.7
★ / day
Trend
→steady
star history
AlpacaFarm provides a low-cost environment for research on learning from human feedback, specifically for instruction-following and alignment of language models. It simulates the RLHF pipeline including preference annotation, allowing researchers to develop and evaluate RLHF methods using automated annotators like GPT-4. The project includes reference implementations of PPO, DPO, and other training algorithms.