← all repositories

tatsu-lab/alpaca_farm

A simulation framework for developing and benchmarking RLHF methods without collecting real human feedback data.

alpaca_farm
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

AlpacaFarm provides a low-cost environment for research on learning from human feedback, specifically for instruction-following and alignment of language models. It simulates the RLHF pipeline including preference annotation, allowing researchers to develop and evaluate RLHF methods using automated annotators like GPT-4. The project includes reference implementations of PPO, DPO, and other training algorithms.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.