langchain-ai/openevals
A library of pre-built LLM-based evaluators for scoring the quality of outputs from LLM applications.

Velocity · 7d
+2.2
★ / day
Trend
→steady
star history
OpenEvals provides pre-built evaluator prompts and LLM-as-judge pipelines for evaluating LLM application outputs. It offers metrics like conciseness, correctness, and helpfulness scoring using an LLM judge (defaulting to GPT-4) to automatically score outputs against defined criteria. Available in both Python and TypeScript, it serves as a starting point for developers writing evals for their production LLM systems.