agentscope-ai/OpenJudge
A unified evaluation framework for assessing AI agent quality and converting grading results into RLHF reward signals.

OpenJudge is an open-source framework designed to evaluate AI applications, particularly AI agents and chatbots. It provides ready-to-use graders and supports generating scenario-specific rubrics to assess application quality. The framework can convert grading results into reward signals that are used to fine-tune and optimize applications through RLHF workflows. It aims to simplify the evaluation workflow from data collection to weakness analysis and rapid iteration.