← all repositories

openai/evals

OpenAI's open-source framework for evaluating large language models and systems against benchmarks.

18.6k stars Python LLMOps · EvalLanguage Models
evals
Velocity · 7d
+15
★ / day
Trend
steady
star history

Evals is a framework for evaluating LLMs and LLM systems, offering a registry of existing benchmarks alongside the ability to create custom evaluations. Users can write their own evals for specific use cases or build private evals representing common LLM patterns in their workflows without exposing data publicly. The framework helps developers understand how different model versions affect their applications.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.