← all repositories

lucidrains/PaLM-rlhf-pytorch

PyTorch implementation of Reinforcement Learning with Human Feedback (RLHF) training on the PaLM language model architecture.

7.9k stars Python Language ModelsML Frameworks
PaLM-rlhf-pytorch
Velocity · 7d
+6.2
★ / day
Trend
steady
star history

This repository provides code to train large language models using RLHF, the same technique behind ChatGPT. It builds on the PaLM architecture and includes components for reward modeling, PPO-based policy optimization, and human feedback integration. The project aims to enable open replication of assistant-style language models using RLHF training pipelines.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.