chopratejas/headroom

Shrink your AI agent's reading list by 60–95% without changing the answers

A local compression layer that sits between your agent and the LLM, cutting token counts by routing tool outputs, logs, and code through type-specific compressors.

★17k stars Python LLMOps · Eval Agents

View on GitHub ↗ Homepage ↗

Velocity · 7d

+112

★ / day

Trend

→steady

star history

What it does

Headroom intercepts everything an AI agent consumes — tool outputs, logs, RAG chunks, files, conversation history — and compresses it before the LLM sees it. It runs as a Python/TypeScript library, a zero-code proxy, or an MCP server. The project claims 60–95% token reduction with preserved accuracy, backed by benchmark tables for GSM8K, TruthfulQA, SQuAD v2, and BFCL.

The interesting bit

The architecture treats compression as a routing problem. A ContentRouter detects whether incoming data is JSON, code (AST), or prose, then dispatches to SmartCrusher, CodeCompressor, or a HuggingFace model called Kompress-base. A reversible layer (CCR) keeps originals locally so the LLM can retrieve them on demand — useful when the compressed version isn’t enough.

Key highlights

Six compression algorithms: JSON crusher, AST-aware code compressor, text model, image router, cache aligner, and reversible CCR storage
headroom wrap claude|codex|cursor|aider|copilot — one-command agent integration
Cross-agent memory with auto-dedup, so Claude and Codex can share context
headroom learn mines failed sessions and writes corrections to CLAUDE.md / AGENTS.md
Local-first: runs on your machine, data doesn’t leave

Caveats

Requires Python 3.10+ and a local process — won’t work in fully sandboxed environments
The “60B+ tokens saved” community metric is displayed prominently but lacks methodology detail in the README
Some integrations (Cursor, Copilot CLI) need manual config paste or proxy startup, not fully automatic

Verdict

Worth a look if you’re burning through context windows with coding agents or multi-step RAG pipelines. Skip if you’re already happy with a single provider’s native compaction and don’t need cross-agent memory or reversible retrieval.