suyoumo/ClawProBench
Live-first benchmark harness for evaluating LLM agents with deterministic grading and repeated-trial reliability.

Velocity · 7d
+1.5
★ / day
Trend
→steady
star history
ClawProBench is an evaluation framework designed to benchmark LLM agents within the OpenClaw runtime environment. It provides structured scenario catalogs with 102 active and 162 total scenarios across core, intelligence, coverage, native, and full profiles. The system emphasizes deterministic grading and repeated-trial reliability, generates structured reports, and maintains a public leaderboard for model comparison.