Is evals open source?

Yes — openai/evals is an open-source project tracked on heatdrop.

What language is evals written in?

openai/evals is primarily written in Python.

How popular is evals?

openai/evals has 19k stars on GitHub and is currently cooling off.

Where can I find evals?

openai/evals is on GitHub at https://github.com/openai/evals.

← all repositories

openai/evals

OpenAI’s answer to 'did the new model break my prompt?'

It gives LLM builders a standardized way to prove that upgrading models won’t silently wreck their apps.

★19k stars Python LLMOps · Eval Language Models

View on GitHub ↗

Velocity · 7d

+5.7

★ / day

Trend

↘cooling

star history

What it does

Evals is OpenAI’s official framework for benchmarking large language models and the systems built on top of them. It ships with a registry of ready-made tests and lets you write private or public evals to see how model changes affect your specific use case. You can run it inside the OpenAI Dashboard or on your own machine.

The interesting bit

The framework is designed to let non-programmers contribute: you can define an eval with nothing but a JSON dataset and a YAML configuration file, no custom code required. OpenAI actively solicits community benchmarks via pull request, though it currently rejects any submission that includes custom Python logic.

Key highlights

Built-in registry of benchmarks covering multiple dimensions of model behavior.
Supports private evals using your own data without exposing it publicly.
Can log results to a Snowflake database for downstream analysis.
Contributions are reviewed by OpenAI staff and considered for future model improvements.
Requires an OpenAI API key and incurs the usual API costs.

Caveats

Runs sometimes hang at the very end after the final report; the README flags this as a known issue.
Benchmark data lives in Git-LFS, so cloning the full registry is heavier than a standard repository pull.
OpenAI is not currently accepting community evals that include custom code, limiting contributions to YAML/JSON configurations.

Verdict

Worth a look if you are shipping production prompts and need empirical evidence before swapping model versions. Skip it if you are looking for a model-agnostic, fully offline evaluation suite—it is tightly coupled to the OpenAI API.

Frequently asked

What is openai/evals?: It gives LLM builders a standardized way to prove that upgrading models won’t silently wreck their apps.
Is evals open source?: Yes — openai/evals is an open-source project tracked on heatdrop.
What language is evals written in?: openai/evals is primarily written in Python.
How popular is evals?: openai/evals has 19k stars on GitHub and is currently cooling off.
Where can I find evals?: openai/evals is on GitHub at https://github.com/openai/evals.