eureka-research/Eureka
A system using LLMs to automatically generate and optimize reward functions for training reinforcement learning agents.

Eureka leverages GPT-4’s code-writing and in-context improvement capabilities to perform evolutionary optimization over reward code, without task-specific prompting or pre-defined templates. The generated reward functions are then used to train RL agents in diverse environments including 10 robot morphologies. The approach outperforms human-engineered rewards on 83% of 29 tested tasks and introduces a gradient-free method for incorporating human feedback into RLHF.