openai/prm800k
A dataset of 800,000 step-level correctness labels on LLM-generated solutions to MATH problems for training and evaluating process reward models.

PRM800K is a process supervision dataset containing 800,000 step-level correctness labels for LLM-generated solutions to MATH problems. The dataset includes labels from human annotators across multiple phases, with quality control mechanisms to ensure label reliability. It was introduced in the paper ‘Let's Verify Step by Step’ and supports research into improving LLM mathematical reasoning through process-based reward models.