The API layer local LLMs were missing
PrivateGPT turns a raw local model server into a Claude-compatible application backend with RAG, tools, and citations.

What it does
PrivateGPT sits between your app and any OpenAI-compatible local inference server (Ollama, vLLM, etc.) to provide the higher-level primitives actual products need: document ingestion, retrieval with citations, tool use, database querying, CSV analysis, web search, MCP connectors, and async workflows. It exposes a Claude API-compatible interface so you can build private AI features without touching cloud APIs.
The interesting bit
The project explicitly refuses to be “yet another chat UI.” It is API-first, with a lightweight workbench UI included only for testing. The architecture is deliberately decoupled: FastAPI routers call services that depend on LlamaIndex abstractions, not concrete implementations, so swapping LLMs, embeddings, or vector stores is meant to be mechanical.
Key highlights
- Claude API-compatible messages, streaming, token counting, and tool use
- Built-in RAG pipeline: ingestion, chunking, embedding, retrieval with citations
- Database querying and CSV/tabular analysis without custom tool wiring
- MCP support for connecting external agent capabilities
- Primordial branch preserves the original 2023 educational version for reference
Caveats
- Prompt caching and OAuth/organizations are explicitly not supported per the compatibility matrix
- “Skills” support is marked basic/early; vision and structured outputs depend on your chosen model
- The README warns it is updated less frequently than the external documentation site
Verdict
Worth a look if you are building a private AI backend and want to skip reimplementing RAG, citations, and tool plumbing. Skip it if you just need a drop-in chat interface—Open WebUI or Onyx already solve that.