lucidrains/self-rewarding-lm-pytorch
A PyTorch training framework implementing MetaAI's Self-Rewarding Language Model with DPO and SPIN training approaches.

Velocity · 7d
+1.6
★ / day
Trend
→steady
star history
This repository implements the training framework from MetaAI’s Self-Rewarding Language Model paper. It enables language models to iteratively improve themselves by serving as their own reward models during training. The framework incorporates Direct Preference Optimization (DPO) for preference-based training and also includes an implementation of the SPIN training method. Built on PyTorch with integration for transformer architectures.