JudgmentLabs/judgeval
An open-source Python SDK providing tracing and agent-judge evaluation for LLM-powered applications to detect failures and validate fixes.

Velocity · 7d
+1.8
★ / day
Trend
→steady
star history
Judgeval instruments any function with OpenTelemetry-based tracing to capture inputs, outputs, and LLM token usage. It defines prompt-based scorers to evaluate agent behaviors at scale, producing scored and labeled outputs that describe how agents acted. The platform automatically scores live production traffic server-side and surfaces detected behaviors as structured signals for agent improvement and monitoring.