An OS layer that stops LLMs from forgetting everything
MemOS gives AI agents persistent, inspectable memory with a graph structure and claims 35% token savings.

What it does MemOS is a memory backend for LLMs and agents that stores, retrieves, and manages long-term context through a unified API. It treats memory as an inspectable graph rather than a black-box embedding store, and supports text, images, tool traces, and personas in one system. You can self-host it with Docker or use a managed cloud API.
The interesting bit The “self-evolving” angle: memories tier up from raw traces (L1) to policies (L2) to world models (L3), with “crystallized skills” extracted from feedback. It’s a tidy conceptual framework, though the README is light on how this actually happens mechanically.
Key highlights
- Graph-structured memory with explicit add / retrieve / edit / delete operations
- Hybrid retrieval: FTS5 full-text search combined with vector search
- Multi-modal support: text, images, tool traces, and personas stored together
- “Memory cubes” for isolating knowledge bases across users, projects, or agents
- Async ingestion via Redis Streams with millisecond-level latency claims
- Natural-language feedback to correct or replace existing memories
- Broad model provider support: OpenAI, Azure, DeepSeek, Qwen, Ollama, vLLM, etc.
Caveats
- The 35.24% token savings and +43.70% accuracy vs. OpenAI Memory figures are presented without methodology detail in the README; the arXiv paper would be the place to verify
- Self-hosted setup requires Neo4j and Qdrant running before the API starts, plus multiple API keys for embedding and reranking services
- The TypeScript repo label is misleading; the core appears to be Python (uvicorn server, pip requirements)
Verdict Worth a look if you’re building agents that need to remember users across sessions and you’re tired of stuffing context windows. Skip if you want a drop-in, zero-dependency memory layer — this is infrastructure, not a library.