A Swiss Army LLM that runs in your basement
h2oGPT wraps local models, document search, voice, and image generation into one privacy-first package.

What it does
h2oGPT is a self-hosted chat interface that lets you query local LLMs and your own documents without sending data to third-party APIs. It ingests PDFs, spreadsheets, images, video frames, even YouTube audio into a persistent vector database (Chroma, Weaviate, or FAISS), then answers questions using retrieval-augmented generation. The project also bundles voice chat, image generation via Stable Diffusion/Flux, and vision model support.
The interesting bit
The breadth is almost comical: it speaks OpenAI’s API dialect, so existing clients drop in without changes; it runs on CPU via llama.cpp or GPU via vLLM/ExLLaMa; it even does “attention sinks” for arbitrarily long generation. The README claims 80 tokens/sec on a 13B Llama 2 model with parallel summarization. That’s a lot of knobs for one codebase.
Key highlights
- Model agnostic: oLLaMa, Mixtral, llama.cpp GGUF, GPT4All, plus remote APIs (OpenAI, Anthropic, Groq, etc.)
- Document ingestion: 1000+ unit tests, HYDE retrieval, semantic chunking, OCR via DocTR
- Multimodal extras: Whisper STT, TTS with voice cloning, LLaVA/Claude-3/GPT-4-Vision, SDXL/Flux image gen
- Deployment flexibility: Docker recommended for full features; Linux native works; Windows/macOS scripts exist but with “less capabilities”
- OpenAI proxy mode: h2oGPT can impersonate an OpenAI server, including streaming, embeddings, function calling, and JSON schema enforcement
Caveats
- The feature surface is vast and the docs are fragmented across a dozen README files (Linux, macOS, Windows, Docker, GPU, CPU, CLI, UI, client, inference servers…)
- “Windows and MAC scripts have less capabilities than using Docker” — the README’s own words
- Agents and some advanced features are API-only, no UI
Verdict
Worth a look if you need a private, air-gappable alternative to ChatGPT with document Q&A and don’t mind some assembly. Skip it if you want a polished SaaS experience; the enterprise version (h2oGPTe) exists for a reason.