← all repositories
PromtEngineer/localGPT

A RAG stack that actually stays on your laptop

LocalGPT wires Ollama, LanceDB, and a smart query router into a private document-chat system.

localGPT
Velocity · 7d
+20
★ / day
Trend
steady
star history

What it does LocalGPT is a self-hosted document-QA stack. You upload files, it builds a search index, and you chat with the contents through a web UI or REST API. Everything runs locally via Ollama; no API keys, no data egress. The system is split into four services—Ollama, a RAG API, a backend server, and a React frontend—managed by a single Python launcher.

The interesting bit The RAG pipeline is more opinionated than most. It mixes semantic search, BM25 keyword matching, and “Late Chunking” for long-context embeddings, then routes each query to either RAG or direct LLM answering based on some internal logic. There’s also a verification pass and sentence-level context pruning. Whether this complexity beats a simpler setup is left as an exercise to the user.

Key highlights

  • Pure-Python RAG core with LanceDB for vectors
  • Supports CUDA, CPU, Intel Gaudi (HPU), and Apple Silicon (MPS)
  • Pluggable models via Ollama; defaults to Qwen3 family for generation and embeddings
  • Semantic caching with TTL to avoid repeated similar queries
  • Session-aware chat history and source attribution on answers
  • Docker and bare-metal install paths, plus a --no-frontend API-only mode

Caveats

  • Installation is currently only tested on macOS; Windows and Linux paths exist in docs but are untested
  • “Multi-format support” currently means PDF only—DOCX, TXT, and Markdown are listed but not working yet
  • The v2 branch is the one to clone; main is behind, which suggests the project is mid-transition

Verdict Worth a look if you need a fully air-gapped RAG setup and don’t mind some assembly. Skip it if you want battle-tested cross-platform stability or production-grade document format support.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.