A prompt-engineering gag that actually cuts your API bill
Claude Code skill makes the agent talk like a caveman and claims ~65% fewer output tokens, with benchmarks to back it up.
What it does
caveman is a Claude Code skill (also works with Codex, Gemini, Cursor, Copilot, and 30+ others) that compresses the agent’s replies into terse, fragment-heavy “caveman” prose. You type /caveman, pick a level from lite to ultra to wenyan (classical Chinese), and the agent drops filler words while keeping technical accuracy. It also ships companion commands for terse commit messages, one-line PR reviews, session token stats, and even rewriting your CLAUDE.md memory files into the same compressed dialect to shrink input tokens.
The interesting bit
The README actually publishes real token counts from the Claude API across ten tasks, and the numbers are surprisingly consistent — the “before/after” table isn’t just a meme, it’s a benchmark. The author also cites a March 2026 arXiv paper finding that brevity constraints can improve model accuracy by 26 points on some benchmarks, which is either a clever justification or a very committed bit.
Key highlights
- Average 65% output token reduction across 10 prompts (range 22–87%), with raw data and a reproduction script in
benchmarks/ caveman-compressrewrites memory files and cuts ~46% of input tokens for every subsequent sessioncaveman-shrinkis an MCP middleware that compresses tool descriptions from any MCP server- Statusline badge tracks lifetime tokens saved; auto-activates for Claude Code, Codex, Gemini
- One-liner install via
curlorirm, ~30 seconds, Node ≥18 required
Caveats
- Only affects output tokens; “thinking/reasoning” tokens are untouched, so the cost savings ceiling is real
- The 65% average includes one outlier at only 22% savings (“refactor callback to async/await”), so your mileage will vary by task type
- Auto-activation requires
--with-initfor Cursor, Windsurf, Cline, Copilot; not all 30+ agents get the same seamless hook
Verdict
Worth trying if you pay per token and spend all day in Claude Code — the compression is real, the install is trivial, and the gimmick wears off fast enough that you just notice faster responses. Skip it if you need your AI to write your emails or documentation; this is strictly for coding sessions where you already know what useMemo does and just want the agent to stop explaining it.