← all repositories

pinchbench/skill

A benchmarking system that evaluates how well LLM models perform as the brain of OpenClaw AI coding agents.

skill
Velocity · 7d
+10
★ / day
Trend
steady
star history

PinchBench measures LLM performance in agentic scenarios by testing real-world tasks including scheduling meetings, writing code, triaging email, researching topics, and managing files. Unlike synthetic benchmarks, it evaluates practical outcomes—did the agent actually create the file, send the email, or complete the task? The system tests tool usage, multi-step reasoning, and the ability to handle ambiguous instructions. Results are published on a public leaderboard at pinchbench.com.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.