Is ClawWork open source?

Yes — HKUDS/ClawWork is open source, released under the MIT license.

What language is ClawWork written in?

HKUDS/ClawWork is primarily written in Python.

How popular is ClawWork?

HKUDS/ClawWork has 8.2k stars on GitHub.

Where can I find ClawWork?

HKUDS/ClawWork is on GitHub at https://github.com/HKUDS/ClawWork.

← all repositories

HKUDS/ClawWork

AI agents that pay their own rent — and turn a profit

ClawWork forces LLMs to earn their keep on real professional tasks, deducting every token cost from a $10 starting balance.

★8.2k stars Python Agents LLMOps · Eval

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does ClawWork is an economic survival benchmark for AI agents. Each agent gets $10, must pay for its own API tokens, and earns money only by completing real professional tasks from the GDPVal dataset — 220 tasks across 44 occupations. A React dashboard tracks balance, income, cost, and survival metrics in real time. It also wraps the Nanobot framework so a live assistant becomes “economically aware,” charging per conversation and earning via task work.

The interesting bit The benchmark measures what actually matters for production deployment: whether the model can turn a profit. Top performers like ATIC + Qwen3.5-Plus have pushed balances past $19K, while careless agents can burn their stake on a single bad search. The “work or learn” daily decision mimics genuine career trade-offs rather than static test scores.

Key highlights

220 GDPVal tasks spanning Technology, Finance, Healthcare, and Legal sectors
Token costs read directly from API responses (including reasoning tokens); OpenRouter costs used verbatim when available
Quality evaluation via GPT-5.2 with category-specific rubrics per sector
Two modes: standalone simulation (./start_dashboard.sh + ./run_test_agent.sh) or drop-in Nanobot integration via ClawMode
Live leaderboard at hkuds.github.io/ClawWork/ with per-agent pay rates and survival tiers

Caveats

Requires OPENAI_API_KEY even for non-OpenAI agents, since GPT-4o handles evaluation
E2B sandbox is the default code execution backend; local BoxLite alternative is marked experimental
Dashboard data on the public site is only periodically synced; local clone needed for real-time updates

Verdict Worth a look if you’re choosing between LLMs for production agents and want evidence beyond benchmark leaderboards. Skip it if you need a polished end-user product — this is a research evaluation framework with a thin UI layer.

Frequently asked

What is HKUDS/ClawWork?: ClawWork forces LLMs to earn their keep on real professional tasks, deducting every token cost from a $10 starting balance.
Is ClawWork open source?: Yes — HKUDS/ClawWork is open source, released under the MIT license.
What language is ClawWork written in?: HKUDS/ClawWork is primarily written in Python.
How popular is ClawWork?: HKUDS/ClawWork has 8.2k stars on GitHub.
Where can I find ClawWork?: HKUDS/ClawWork is on GitHub at https://github.com/HKUDS/ClawWork.