openai/frontier-evals
OpenAI's open-source framework for evaluating frontier AI model capabilities using structured benchmarks.

Velocity · 7d
+2.8
★ / day
Trend
→steady
star history
Frontier Evals provides reproducible evaluation suites for assessing state-of-the-art AI models on complex tasks. It includes PaperBench for replicating AI research papers, SWE-Lancer for real software engineering freelance tasks, and EVMBench for smart contract security testing. Each benchmark runs models end-to-end against verifiable ground-truth outcomes and uses uv for environment management.