← all repositories
JuliusBrussee/caveman

A prompt-engineering gag that actually cuts your API bill

Claude Code skill makes the agent talk like a caveman and claims ~65% fewer output tokens, with benchmarks to back it up.

69.8k stars JavaScript Coding AssistantsLLMOps · Eval
caveman
Velocity · 7d
+1077
★ / day
Trend
steady
star history

What it does

caveman is a Claude Code skill (also works with Codex, Gemini, Cursor, Copilot, and 30+ others) that compresses the agent’s replies into terse, fragment-heavy “caveman” prose. You type /caveman, pick a level from lite to ultra to wenyan (classical Chinese), and the agent drops filler words while keeping technical accuracy. It also ships companion commands for terse commit messages, one-line PR reviews, session token stats, and even rewriting your CLAUDE.md memory files into the same compressed dialect to shrink input tokens.

The interesting bit

The README actually publishes real token counts from the Claude API across ten tasks, and the numbers are surprisingly consistent — the “before/after” table isn’t just a meme, it’s a benchmark. The author also cites a March 2026 arXiv paper finding that brevity constraints can improve model accuracy by 26 points on some benchmarks, which is either a clever justification or a very committed bit.

Key highlights

  • Average 65% output token reduction across 10 prompts (range 22–87%), with raw data and a reproduction script in benchmarks/
  • caveman-compress rewrites memory files and cuts ~46% of input tokens for every subsequent session
  • caveman-shrink is an MCP middleware that compresses tool descriptions from any MCP server
  • Statusline badge tracks lifetime tokens saved; auto-activates for Claude Code, Codex, Gemini
  • One-liner install via curl or irm, ~30 seconds, Node ≥18 required

Caveats

  • Only affects output tokens; “thinking/reasoning” tokens are untouched, so the cost savings ceiling is real
  • The 65% average includes one outlier at only 22% savings (“refactor callback to async/await”), so your mileage will vary by task type
  • Auto-activation requires --with-init for Cursor, Windsurf, Cline, Copilot; not all 30+ agents get the same seamless hook

Verdict

Worth trying if you pay per token and spend all day in Claude Code — the compression is real, the install is trivial, and the gimmick wears off fast enough that you just notice faster responses. Skip it if you need your AI to write your emails or documentation; this is strictly for coding sessions where you already know what useMemo does and just want the agent to stop explaining it.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.