openai/evals
OpenAI's open-source framework for evaluating large language models and systems against benchmarks.

Velocity · 7d
+15
★ / day
Trend
→steady
star history
Evals is a framework for evaluating LLMs and LLM systems, offering a registry of existing benchmarks alongside the ability to create custom evaluations. Users can write their own evals for specific use cases or build private evals representing common LLM patterns in their workflows without exposing data publicly. The framework helps developers understand how different model versions affect their applications.