allenai/RL4LMs
A reinforcement learning library for fine-tuning language models to optimize human preference reward functions.

Velocity · 7d
+1.7
★ / day
Trend
→steady
star history
RL4LMs provides modular building blocks for training language models with reinforcement learning, including on-policy algorithms (PPO, A2C, TRPO, NLPO), reward functions, and 20+ NLG metrics. It supports causal LMs (GPT-2/3) and seq2seq LMs (T5, BART) across NLP tasks including summarization, translation, dialogue generation, and question answering. The library has been benchmarked across 2000+ experiments on the GRUE benchmark.