raiyanyahya/recall · 27 Jun 2026 · Feature

Claude Code’s Cold-Start Fix Runs on TextRank, Not Tokens

Patrick Donovan
Patrick Donovan
Staff Writer

Recall is a fully-local plugin that persists project context using classical extractive summarization, no API keys, and zero runtime dependencies.

raiyanyahya/recall
571 stars

The Cold-Start Tax

Every new Claude Code session begins with amnesia. The agent knows nothing of the architecture decisions you argued through yesterday, the bug that led to that temporary hack in utils.py, or the branch you swore you would refactor next. Anthropic provides built-in mitigations, but each is a compromise. CLAUDE.md and the # shortcut load hand-written rules—useful for coding standards, yet manual, static, and silent on what actually happened. --continue and --resume replay prior conversations in full fidelity, but they drag the entire transcript back into the context window, burning subscription credits and tying you to the session history on a single machine. Context compaction condenses dialogue, but only within a session; it does not survive from Monday to Thursday.

This friction has birthed a cottage industry of memory plugins. The most prominent, claude-mem, has accumulated 84.5k GitHub stars and a footprint that resembles a production SaaS: a Node.js worker on port 37777, a SQLite database, a vector database for semantic retrieval, an Express HTTP server, Server-Sent Events, a React web UI, and an MCP server [5]. Its own developers had to refactor a telemetry pipeline that was ingesting ~45 million events a month and costing roughly $7,700 on PostHog before they compressed it into rollups [5]. For a developer who simply wants to avoid re-explaining the project every morning, that is a staggering stack.

A Deterministic Memory Layer

Recall, an MIT-licensed plugin by Raiyan Yahya, is the architectural counter-argument. It hooks Claude Code’s SessionStart, Stop, and SessionEnd events to append session activity to .recall/history.md, an append-only log of prompts, replies, files touched, and commands run. When a session ends—or on demand via /recall:save—a local summarizer reads the log and overwrites .recall/context.md with a compact resume.

The summarizer is the project’s central insight, and it is aggressively old-school. It vectorizes sentences with TF-IDF, builds a cosine-similarity graph, and runs TextRank—essentially PageRank power iteration on that graph—to score centrality. The top N sentences are kept in original order, then wrapped with deterministic metadata scraped from the transcript and git: the session’s opening goal, files modified, shell commands executed, a git diff --stat, and where the user left off. The resulting digest is roughly 1–2K tokens. It costs zero model tokens to produce, makes zero API calls, and requires no network connection.

If numpy is importable, the math is vectorized. If not, an identical pure-Python implementation runs the same algorithm. The entire implementation is vendored in summarizer.py. There are no pip install rituals, no weight files to torrent, no local LLM to keep resident in VRAM, and no second subscription to monitor. It is deterministic, extractive summarization—the kind of NLP that predates the transformer—repurposed to solve a very modern context-window problem.

The Anti-Stack

Where Recall really diverges from the pack is in what it refuses to build. Zilliz’s memsearch ccplugin layers persistent memory atop the Milvus vector database, indexing conversations into searchable semantic storage [11]. Oracle’s PicoOraClaw pairs a Go runtime with Oracle AI Database for ACID-backed persistence and vector search [12]. Even the lighter entrants tend to assume that memory implies embeddings, vector distance, and some form of database.

Recall assumes none of that. Its entire state is two Markdown files in a hidden directory. There is no SQLite, no vector store, no HTTP port to conflict with your dev server, no React frontend, and no telemetry pipeline to accidentally rack up four-figure analytics bills. The README is explicit about its positioning: CLAUDE.md is how I want you to work; Recall is here is what we did last time and where we stopped. At session start, the plugin surfaces context.md and fences it as untrusted reference data; Claude asks whether to resume from it. Memory becomes a diffable, git-ignorable artifact rather than a managed service.

This minimalism is also a pricing strategy. Claude Code subscribers already pay for access. Recall ensures they do not pay a second meter to summarize their own transcripts, and it shrinks the context they must feed into each new session. By resuming from a 1–2K token digest instead of re-explaining the project from scratch, users spend fewer credits per session.

Privacy by Architecture

Most “local” tools still phone home for embeddings or summarization. Recall’s privacy claims are structural rather than aspirational. The plugin makes no network calls, references no API keys, and loads no third-party models. The hooks are written against Python’s stdlib. The only optional dependency is numpy, used strictly as an accelerator.

The engineering detail extends to paranoia about the repository itself. Git commands are executed with core.fsmonitor, diff.external, hooks, and the pager disabled, preventing a cloned, untrusted repo from executing code through its own git config when Recall queries ground truth. Writes are confined to the project directory; a malicious recall.config.json shipped with a repo cannot redirect output to an absolute path or parent directory. A best-effort redaction pass strips common secret shapes—API keys, .env assignments, PEM blocks—before writing to disk, on the assumption that .recall/ might be committed accidentally. The README even includes a troubleshooting note clarifying that an “Invalid API key” error comes from a stale ANTHROPIC_API_KEY environment variable shadowing the subscription login, not from Recall itself—a small signal of how carefully the author has drawn the trust boundary.

That boundary is not absolute. If a team commits .recall/ as shared memory, a malicious contributor could craft a context.md to attempt prompt injection. The plugin fences the content and labels it untrusted, but the README is honest: if you do not trust every writer with repo access, keep .recall/ git-ignored.

The Limits of Extractive Memory

Recall’s simplicity carries trade-offs. TextRank is extractive, not abstractive. It surfaces the most statistically central sentences in a transcript but cannot synthesize new meaning, infer implicit goals, or clean up a meandering debug session. If your last conversation was a three-hour spiral of false starts, the summary will faithfully preserve the spiral. It compresses, but it does not understand.

The redaction pass is also best-effort, not cryptographically guaranteed. And while the pure-Python fallback ensures portability, a 200,000-character cap on summarizer input means older turns are silently dropped on very long sessions. The README promises that context.md includes “next steps / open threads,” but how an extractive algorithm reliably distinguishes an open thread from a discarded idea is unclear. For users who need semantic search across months of conversations, or abstractive summaries that distill architectural rationale, Recall is deliberately the wrong tool. It targets a narrow, well-defined gap: the developer who needs yesterday’s context tomorrow, without installing a stack.

Outlook

Recall arrives at a moment when AI tooling is bifurcating. One path leads toward heavier infrastructure—vector databases, embedding APIs, telemetry dashboards, and multi-tier memory systems. The other, shaped by VRAM constraints, subscription token limits, and privacy concerns, asks how little infrastructure can suffice. A survey of local AI coding assistants notes that even a 21 GB model can demand 25–30 GB of VRAM, pushing consumer GPUs to their limits [3]. In that environment, a plugin that runs on stdlib Python and optional numpy is a statement.

The project backs that statement with unusual rigor for its size. Continuous integration runs linting, security static analysis, and the test suite across Python 3.9 through 3.13—both with and without numpy—plus a benchmark quality gate that asserts the numpy and pure-Python cores select the same sentences. It is a reminder that “lightweight” does not have to mean “sloppy.”

Whether that minimalism scales to teams, or whether the convenience of semantic retrieval eventually pulls even skeptics toward lightweight embeddings, remains unresolved. For now, Recall offers something rare: a plugin that costs nothing to run, sends nothing to the cloud, and persists its entire state in two Markdown files. In a landscape crowded with four-figure telemetry pipelines and vector-database backends, there is a quiet audacity to that.

Sources

  1. Recall.ai - The API for Meeting Recording
  2. claude-mem hits 46.1K stars as persistent memory plugin for Claude ...
  3. Running AI Coding Assistants Locally — Lessons Learned - Medium
  4. Recalls, Market Withdrawals, & Safety Alerts - FDA
  5. thedotmack/claude-mem: Persistent Context Across ... - GitHub
  6. Built a local-first document memory layer for AI agents that survives ...
  7. Look up Safety Recalls & Service Campaigns by VIN | Toyota Owners
  8. I Built a tool that gives Claude Code persistent memory and ... - Reddit
  9. The Ultimate Local AI Coding Guide For 2026 - YouTube
  10. Recalls & Product Safety Warnings | CPSC.gov
  11. Persistent Memory for Claude Code: memsearch ccplugin - Milvus
  12. Build an Ultra-Lightweight, Local AI Assistant with Persistent Memory

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.