Is judgeval open source?

Yes — JudgmentLabs/judgeval is open source, released under the Apache-2.0 license.

What language is judgeval written in?

JudgmentLabs/judgeval is primarily written in Python.

How popular is judgeval?

JudgmentLabs/judgeval has 1k stars on GitHub.

Where can I find judgeval?

JudgmentLabs/judgeval is on GitHub at https://github.com/JudgmentLabs/judgeval.

← all repositories

JudgmentLabs/judgeval

LLM agent observability that grades its own homework

It closes the loop between shipping an agent and actually knowing why it misbehaves in production.

★1k stars Python LLMOps · Eval Agents

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Judgeval is an open-source Python SDK that instruments LLM-powered applications with OpenTelemetry-based tracing and prompt-based evaluation. It captures inputs, outputs, and token usage via decorators, then lets you define automated scorers called judges that label and score agent behaviors. The output is a searchable history of agent actions, used to catch regressions and validate fixes against real production cases.

The interesting bit

The replay mechanic is the standout: you can run judges against historical traces to verify that a fix actually resolves past failures, not just new traffic. Scoring happens server-side, which the project claims avoids client latency, and detected behaviors can feed directly into Slack alerts.

Key highlights

OpenTelemetry foundation should slot into existing observability stacks without wholesale migration.
Auto-instrumentation covers OpenAI, Anthropic, Google GenAI, and Together AI, plus framework support for LangGraph and the Claude Agent SDK.
Judges produce structured, labeled behavior records that accumulate into a searchable archive over time.
A separate CLI and MCP server let you query traces, deploy judges, and surface failures inside an IDE or AI assistant.

Caveats

The service is cloud-backed and requires Judgment Labs API credentials; it is not a fully offline, self-hosted tool.

Verdict

Worth evaluating if you run agents in production and need structured scoring without building an evaluation pipeline from scratch. Look elsewhere if you require a fully self-hosted, offline observability stack.

Frequently asked

What is JudgmentLabs/judgeval?: It closes the loop between shipping an agent and actually knowing why it misbehaves in production.
Is judgeval open source?: Yes — JudgmentLabs/judgeval is open source, released under the Apache-2.0 license.
What language is judgeval written in?: JudgmentLabs/judgeval is primarily written in Python.
How popular is judgeval?: JudgmentLabs/judgeval has 1k stars on GitHub.
Where can I find judgeval?: JudgmentLabs/judgeval is on GitHub at https://github.com/JudgmentLabs/judgeval.