AI agents that pay their own rent — and turn a profit
ClawWork forces LLMs to earn their keep on real professional tasks, deducting every token cost from a $10 starting balance.

What it does ClawWork is an economic survival benchmark for AI agents. Each agent gets $10, must pay for its own API tokens, and earns money only by completing real professional tasks from the GDPVal dataset — 220 tasks across 44 occupations. A React dashboard tracks balance, income, cost, and survival metrics in real time. It also wraps the Nanobot framework so a live assistant becomes “economically aware,” charging per conversation and earning via task work.
The interesting bit The benchmark measures what actually matters for production deployment: whether the model can turn a profit. Top performers like ATIC + Qwen3.5-Plus have pushed balances past $19K, while careless agents can burn their stake on a single bad search. The “work or learn” daily decision mimics genuine career trade-offs rather than static test scores.
Key highlights
- 220 GDPVal tasks spanning Technology, Finance, Healthcare, and Legal sectors
- Token costs read directly from API responses (including reasoning tokens); OpenRouter costs used verbatim when available
- Quality evaluation via GPT-5.2 with category-specific rubrics per sector
- Two modes: standalone simulation (
./start_dashboard.sh+./run_test_agent.sh) or drop-in Nanobot integration via ClawMode - Live leaderboard at hkuds.github.io/ClawWork/ with per-agent pay rates and survival tiers
Caveats
- Requires
OPENAI_API_KEYeven for non-OpenAI agents, since GPT-4o handles evaluation - E2B sandbox is the default code execution backend; local BoxLite alternative is marked experimental
- Dashboard data on the public site is only periodically synced; local clone needed for real-time updates
Verdict Worth a look if you’re choosing between LLMs for production agents and want evidence beyond benchmark leaderboards. Skip it if you need a polished end-user product — this is a research evaluation framework with a thin UI layer.