A 99% Token Cut: How One MCP Server Replaces Grep with Graphs

Senior Editor

codebase-memory-mcp builds a persistent knowledge graph inside a single static binary, letting AI coding agents query structure instead of reading files.

DeusData/codebase-memory-mcp

★8.3k stars Velocity · 7d +698 ★/day ↗accelerating

star history

View on GitHub ↗

The Model Context Protocol is having its USB-C moment. Anthropic’s standard for connecting AI agents to external tools has been downloaded more than twenty-five million times per week, and enterprises are adopting it as a de facto interface for coding assistants. Yet the ecosystem is starting to look like the Wild West. A recent large-scale study of 1,899 open-source MCP servers found that 7.2% contain general security vulnerabilities and 5.5% suffer from MCP-specific tool poisoning, while security researchers note the protocol itself ships with no built-in security mechanisms at all. Most code-intelligence servers, meanwhile, arrive as sprawling stacks: they want Docker, Neo4j, Redis, an embeddings API, or a downloaded language model just to tell you who calls a function. Against that backdrop, the arrival of a single static binary that indexes an entire codebase and answers structural questions in under a millisecond feels less like an incremental tool and more like a bet on an entirely different weight class.

That binary is codebase-memory-mcp, an open-source MCP server written in pure C with zero runtime dependencies. Its premise is simple. Instead of letting an AI agent burn tokens grepping through files, it pre-computes a knowledge graph of the repository—functions, classes, call chains, HTTP routes, cross-service links—and exposes it through fourteen MCP tools. The agent asks a structural question; the graph answers directly. The project’s own preprint reports that five structural queries consume roughly 3,400 tokens via the graph, versus roughly 412,000 tokens via file-by-file exploration—a 99.2% reduction. The trade-off is a modest dip in answer quality on open-ended reasoning tasks (83% versus 92% for a file-exploration baseline), but for graph-native questions such as hub detection, caller ranking, or impact analysis, it matches or exceeds the baseline across nineteen of thirty-one evaluated languages.

A Graph Database Disguised as a Systems Utility

What makes the project technically unusual is not the concept of a code knowledge graph—academics and GitLab have explored that territory before—but the systems-level packaging. The entire engine is a single static binary for macOS, Linux, and Windows. It embeds 158 vendored tree-sitter grammars, an in-memory SQLite graph store, LZ4 compression, and a fused Aho-Corasick pattern matcher. There are no containers to run, no pip installs, no Node modules, and no API keys. On an Apple M3 Pro it indexes a 50,000-line repository in under a second and the full Linux kernel—28 million lines across 75,000 files—in three minutes, producing 2.1 million nodes and 4.9 million edges.

The indexing pipeline is RAM-first: files are parsed, compressed, and held in memory until a single SQLite dump is written to disk, after which the memory is released. Persistence lives in a local cache directory; a background watcher detects git changes and incrementally re-indexes. Teams can even commit a zstd-compressed snapshot of the graph (.codebase-memory/graph.db.zst) to version control so that teammates skip the initial indexing entirely. The artifact is compacted with VACUUM INTO, compressed at ratios of 8:1 to 13:1, and guarded by a merge=ours gitattribute so concurrent edits do not conflict.

Where the project pushes beyond typical static analysis is its Hybrid LSP layer. Tree-sitter alone yields a syntactic AST: it knows that a function is called, but not whether that call resolves to a generic implementation three modules away. Codebase-memory-mcp ships clean-room re-implementations of the type-resolution algorithms used by tsserver, pyright, gopls, intelephense, and Roslyn, embedded directly into the binary. For Python, TypeScript, JavaScript, PHP, C#, Go, C, and C++, the system performs a second pass over the tree-sitter output, refining CALLS and USAGE edges with import-aware, generic-aware, inheritance-aware resolution. No language-server process is spawned; no per-project LSP configuration is required. The result is a graph accurate enough to trace calls across packages and stdlib boundaries without the operational overhead of maintaining nine separate language servers.

The Efficiency Bargain

The project’s authors are candid about the trade-offs. In their evaluation across thirty-one real-world repositories, the graph-based agent used ten times fewer tokens and 2.1 times fewer tool calls, but its overall answer quality lagged behind a file-exploration agent by nine percentage points. The gap makes sense: a knowledge graph excels at relational questions—“what calls ProcessOrder?”, “which routes depend on this service?”, “what is the blast radius of this diff?"—but it cannot read comments, infer intent from variable names, or reason about code that exists outside the graph schema. For those tasks, the agent still needs to fall back to reading source text.

The design implicitly accepts this division of labor. The MCP server is strictly a structural backend; it contains no LLM and performs no natural-language translation. The agent you are already using—Claude Code, Codex CLI, Gemini, or any of the other eleven supported clients—translates your question into a graph query. This keeps the binary small, the supply chain minimal, and the cost predictable. It also means the tool is not trying to be a chatbot; it is trying to be the memory that the chatbot was missing.

Security as Architecture, Not Afterthought

In an ecosystem where researchers have already deployed honeypots to watch threat actors scan for exposed MCP servers, codebase-memory-mcp’s security posture is arguably its most deliberate feature. The tool processes everything locally; your code never leaves the machine. Every release binary is signed, checksummed, scanned by seventy-two antivirus engines, and published with SLSA Level 3 build provenance and Sigstore cosign bundles. The release pipeline is blocked by CodeQL static analysis alerts. The authors publish SHA-256 hashes and invite source-level audits.

This matters because MCP servers are, by definition, code-execution environments. As one security analysis put it, MCP simply allows developers to facilitate AI agents to execute code, and the protocol itself does not include security mechanisms. By stripping out network calls, external databases, and runtime dependencies, codebase-memory-mcp shrinks the attack surface to the binary itself. Whether that is enough in an enterprise setting remains an open question—MCP-specific vulnerabilities such as tool poisoning or prompt hijacking are logic-level risks that no antivirus scan can fully eliminate—but the project at least treats supply-chain integrity as a first-class engineering constraint rather than a documentation footnote.

The Landscape and the Limits

Codebase-memory-mcp is not the only attempt to give AI agents a structural memory of code. GitLab’s Knowledge Graph (gkg) offers similar universal call-graph parsing and MCP integration, but it is currently in maintenance mode, succeeded by a new orbit project, and relies on the Kuzu graph database and external infrastructure. Academic frameworks such as the Quantiphi knowledge-graph approach use Neo4j and LLM-generated descriptions, yielding rich semantic context at the cost of significant setup. Codebase-memory-mcp is the minimalist counter-proposal: a systems tool that behaves more like ripgrep or git than like a microservices stack.

The rough edges are visible in the sources. Language support spans 158 grammars, but benchmarked quality varies from excellent (Lua, Kotlin, C++, Zig) down to functional (OCaml, Haskell). The Hybrid LSP layer is a reimplementation, not the canonical language server, so edge cases in complex generics or dynamic metaprogramming will inevitably slip through; an early LinkedIn comment noted C# support was initially commented out and required a community fix. Runtime-dynamic route construction—think Express routes built by string concatenation—remains a hard problem for static analysis, and the project’s cross-service HTTP linking relies on confidence scoring that may struggle with indirection.

Outlook

The broader tension here is between richness and friction. AI coding agents are being asked to reason about ever-larger codebases, yet the dominant exploration strategy—read file, grep, read another file—scales linearly in token cost and latency. Knowledge graphs offer a sub-linear alternative, but only if the graph is accurate, current, and cheap enough to maintain that developers actually keep it enabled. By collapsing the entire pipeline into a static binary that auto-detects agents, auto-indexes on change, and allows teams to share compressed graph artifacts like lockfiles, codebase-memory-mcp is betting that the winning interface is invisible infrastructure. If the nine-percent quality gap closes—or if agents learn to hybridize, querying the graph for navigation and reading files for nuance—the protocol’s “USB-C port” may finally have a memory worth plugging into.