← all repositories
MemTensor/MemOS

An OS layer that stops LLMs from forgetting everything

MemOS gives AI agents persistent, inspectable memory with a graph structure and claims 35% token savings.

9.7k stars TypeScript AgentsRAG · SearchLLMOps · Eval
MemOS
Velocity · 7d
+29
★ / day
Trend
steady
star history

What it does MemOS is a memory backend for LLMs and agents that stores, retrieves, and manages long-term context through a unified API. It treats memory as an inspectable graph rather than a black-box embedding store, and supports text, images, tool traces, and personas in one system. You can self-host it with Docker or use a managed cloud API.

The interesting bit The “self-evolving” angle: memories tier up from raw traces (L1) to policies (L2) to world models (L3), with “crystallized skills” extracted from feedback. It’s a tidy conceptual framework, though the README is light on how this actually happens mechanically.

Key highlights

  • Graph-structured memory with explicit add / retrieve / edit / delete operations
  • Hybrid retrieval: FTS5 full-text search combined with vector search
  • Multi-modal support: text, images, tool traces, and personas stored together
  • “Memory cubes” for isolating knowledge bases across users, projects, or agents
  • Async ingestion via Redis Streams with millisecond-level latency claims
  • Natural-language feedback to correct or replace existing memories
  • Broad model provider support: OpenAI, Azure, DeepSeek, Qwen, Ollama, vLLM, etc.

Caveats

  • The 35.24% token savings and +43.70% accuracy vs. OpenAI Memory figures are presented without methodology detail in the README; the arXiv paper would be the place to verify
  • Self-hosted setup requires Neo4j and Qdrant running before the API starts, plus multiple API keys for embedding and reranking services
  • The TypeScript repo label is misleading; the core appears to be Python (uvicorn server, pip requirements)

Verdict Worth a look if you’re building agents that need to remember users across sessions and you’re tired of stuffing context windows. Skip if you want a drop-in, zero-dependency memory layer — this is infrastructure, not a library.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.