openai/SWELancer-Benchmark
A benchmark evaluating how well frontier LLMs perform on real-world freelance software engineering tasks sourced from freelancing platforms.

Velocity · 7d
+3.0
★ / day
Trend
→steady
star history
The SWE-Lancer benchmark evaluates frontier LLMs on real-world freelance software engineering work sourced from freelancing platforms, measuring their ability to earn revenue similar to human freelancers. It provides a dataset of actual freelance tasks with monetary outcomes and code for running evaluation experiments. The benchmark was merged into OpenAI’s preparedness repository for ongoing use.