← all repositories
chopratejas/headroom

Shrink your AI agent's reading list by 60–95% without changing the answers

A local compression layer that sits between your agent and the LLM, cutting token counts by routing tool outputs, logs, and code through type-specific compressors.

17k stars Python LLMOps · EvalAgents
headroom
Velocity · 7d
+112
★ / day
Trend
steady
star history

What it does

Headroom intercepts everything an AI agent consumes — tool outputs, logs, RAG chunks, files, conversation history — and compresses it before the LLM sees it. It runs as a Python/TypeScript library, a zero-code proxy, or an MCP server. The project claims 60–95% token reduction with preserved accuracy, backed by benchmark tables for GSM8K, TruthfulQA, SQuAD v2, and BFCL.

The interesting bit

The architecture treats compression as a routing problem. A ContentRouter detects whether incoming data is JSON, code (AST), or prose, then dispatches to SmartCrusher, CodeCompressor, or a HuggingFace model called Kompress-base. A reversible layer (CCR) keeps originals locally so the LLM can retrieve them on demand — useful when the compressed version isn’t enough.

Key highlights

  • Six compression algorithms: JSON crusher, AST-aware code compressor, text model, image router, cache aligner, and reversible CCR storage
  • headroom wrap claude|codex|cursor|aider|copilot — one-command agent integration
  • Cross-agent memory with auto-dedup, so Claude and Codex can share context
  • headroom learn mines failed sessions and writes corrections to CLAUDE.md / AGENTS.md
  • Local-first: runs on your machine, data doesn’t leave

Caveats

  • Requires Python 3.10+ and a local process — won’t work in fully sandboxed environments
  • The “60B+ tokens saved” community metric is displayed prominently but lacks methodology detail in the README
  • Some integrations (Cursor, Copilot CLI) need manual config paste or proxy startup, not fully automatic

Verdict

Worth a look if you’re burning through context windows with coding agents or multi-step RAG pipelines. Skip if you’re already happy with a single provider’s native compaction and don’t need cross-agent memory or reversible retrieval.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.