confident-ai/deepeval
DeepEval is an open-source evaluation framework for testing and measuring the quality of LLM outputs.

Velocity · 7d
+15
★ / day
Trend
→steady
star history
DeepEval provides a Python-based framework for evaluating large language model outputs against configurable metrics. It offers built-in evaluation criteria and a test-runner workflow for systematically assessing LLM performance. The framework integrates with various LLM backends and provides tooling for developers to benchmark and validate AI applications.