← all repositories

princeton-nlp/SimPO

A reference-free preference optimization algorithm for aligning large language models, published at NeurIPS 2024.

953 stars Python Language ModelsML Frameworks
SimPO
Velocity · 7d
+1.3
★ / day
Trend
steady
star history

SimPO is a preference alignment method for LLMs that improves upon DPO by removing the need for a reference model. It uses a reward-free objective with length-normalized scoring to achieve better alignment. The project includes training code, released model checkpoints on HuggingFace, and demonstrates strong results on AlpacaEval 2 and Arena-Hard benchmarks across multiple base models including Llama3 and Gemma2.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.