Is OpenJudge open source?

Yes — agentscope-ai/OpenJudge is open source, released under the Apache-2.0 license.

What language is OpenJudge written in?

agentscope-ai/OpenJudge is primarily written in Python.

How popular is OpenJudge?

agentscope-ai/OpenJudge has 741 stars on GitHub.

Where can I find OpenJudge?

agentscope-ai/OpenJudge is on GitHub at https://github.com/agentscope-ai/OpenJudge.

← all repositories

agentscope-ai/OpenJudge

Fifty-plus ways to tell your AI agent it messed up

OpenJudge exists to give AI applications a systematic evaluation workflow—complete with 50+ graders, auto-generated rubrics, and reward signals for RL fine-tuning.

★741 stars Python LLMOps · Eval Agents

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does OpenJudge is an open-source evaluation framework for AI applications such as agents and chatbots. It ships with over 50 built-in graders covering general text quality, code syntax, agent lifecycle stages like tool selection and memory, and multimodal coherence. The framework also generates scenario-specific rubrics on demand and can convert grading results into reward signals for reinforcement learning workflows.

The interesting bit Instead of only checking final outputs, OpenJudge evaluates the entire agent lifecycle—planning, reflection, trajectory, and tool use. It also exposes everything through an online playground where you can test graders and spin up custom rubrics without writing code or installing anything.

Key highlights

50+ production-ready graders across general, agent, and multimodal domains, each backed by benchmark datasets and pytest validation
Flexible rubric generation: write custom Python graders, generate zero-shot rubrics from a task description, or derive data-driven criteria from annotated examples
Built-in integrations with observability platforms like LangSmith and Langfuse, plus RL training frameworks such as VERL
Online playground at openjudge.me/app for interactive grader testing, custom rubric building, and benchmark leaderboards
Recent additions include skill-specific graders for agent skill packages and a reference hallucination benchmark arena

Caveats

The README advertises training dedicated judge models for “peak performance” but offers no accuracy benchmarks or comparisons against prompt-based grading in the source material
Several new features (Streamlit UI, paper review mode) are announced in the News section but lack substantive detail inside the README itself

Verdict Teams running AI agents in production and needing systematic, ongoing evaluation across multiple dimensions should start here. If you only need an occasional one-off LLM-as-a-judge check, it is likely more framework than you need.

Frequently asked

What is agentscope-ai/OpenJudge?: OpenJudge exists to give AI applications a systematic evaluation workflow—complete with 50+ graders, auto-generated rubrics, and reward signals for RL fine-tuning.
Is OpenJudge open source?: Yes — agentscope-ai/OpenJudge is open source, released under the Apache-2.0 license.
What language is OpenJudge written in?: agentscope-ai/OpenJudge is primarily written in Python.
How popular is OpenJudge?: agentscope-ai/OpenJudge has 741 stars on GitHub.
Where can I find OpenJudge?: agentscope-ai/OpenJudge is on GitHub at https://github.com/agentscope-ai/OpenJudge.