← all repositories
zylon-ai/private-gpt

The API layer local LLMs were missing

PrivateGPT turns a raw local model server into a Claude-compatible application backend with RAG, tools, and citations.

57.2k stars Python AgentsRAG · SearchLLMOps · Eval
private-gpt
Velocity · 7d
+50
★ / day
Trend
steady
star history

What it does

PrivateGPT sits between your app and any OpenAI-compatible local inference server (Ollama, vLLM, etc.) to provide the higher-level primitives actual products need: document ingestion, retrieval with citations, tool use, database querying, CSV analysis, web search, MCP connectors, and async workflows. It exposes a Claude API-compatible interface so you can build private AI features without touching cloud APIs.

The interesting bit

The project explicitly refuses to be “yet another chat UI.” It is API-first, with a lightweight workbench UI included only for testing. The architecture is deliberately decoupled: FastAPI routers call services that depend on LlamaIndex abstractions, not concrete implementations, so swapping LLMs, embeddings, or vector stores is meant to be mechanical.

Key highlights

  • Claude API-compatible messages, streaming, token counting, and tool use
  • Built-in RAG pipeline: ingestion, chunking, embedding, retrieval with citations
  • Database querying and CSV/tabular analysis without custom tool wiring
  • MCP support for connecting external agent capabilities
  • Primordial branch preserves the original 2023 educational version for reference

Caveats

  • Prompt caching and OAuth/organizations are explicitly not supported per the compatibility matrix
  • “Skills” support is marked basic/early; vision and structured outputs depend on your chosen model
  • The README warns it is updated less frequently than the external documentation site

Verdict

Worth a look if you are building a private AI backend and want to skip reimplementing RAG, citations, and tool plumbing. Skip it if you just need a drop-in chat interface—Open WebUI or Onyx already solve that.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.