pinchbench/skill
A benchmarking system that evaluates how well LLM models perform as the brain of OpenClaw AI coding agents.

PinchBench measures LLM performance in agentic scenarios by testing real-world tasks including scheduling meetings, writing code, triaging email, researching topics, and managing files. Unlike synthetic benchmarks, it evaluates practical outcomes—did the agent actually create the file, send the email, or complete the task? The system tests tool usage, multi-step reasoning, and the ability to handle ambiguous instructions. Results are published on a public leaderboard at pinchbench.com.