← all repositories

openai/SWELancer-Benchmark

A benchmark evaluating how well frontier LLMs perform on real-world freelance software engineering tasks sourced from freelancing platforms.

SWELancer-Benchmark
Velocity · 7d
+3.0
★ / day
Trend
steady
star history

The SWE-Lancer benchmark evaluates frontier LLMs on real-world freelance software engineering work sourced from freelancing platforms, measuring their ability to earn revenue similar to human freelancers. It provides a dataset of actual freelance tasks with monetary outcomes and code for running evaluation experiments. The benchmark was merged into OpenAI’s preparedness repository for ongoing use.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.