princeton-nlp/SimPO
A reference-free preference optimization algorithm for aligning large language models, published at NeurIPS 2024.

Velocity · 7d
+1.3
★ / day
Trend
→steady
star history
SimPO is a preference alignment method for LLMs that improves upon DPO by removing the need for a reference model. It uses a reward-free objective with length-normalized scoring to achieve better alignment. The project includes training code, released model checkpoints on HuggingFace, and demonstrates strong results on AlpacaEval 2 and Arena-Hard benchmarks across multiple base models including Llama3 and Gemma2.