← all repositories
khoj-ai/khoj

Self-hosted AI that actually talks to your docs

Khoj wires any LLM into your notes, files, and messaging apps so you can query your own data instead of begging ChatGPT to remember context.

khoj
Velocity · 7d
+20
★ / day
Trend
steady
star history

What it does Khoj is a personal AI layer that sits between you and any local or online LLM. It ingests your documents—PDFs, Markdown, Notion, Word, org-mode, images—and lets you query them via chat, semantic search, or scheduled automations. You reach it through a browser, Obsidian, Emacs, desktop, phone, or WhatsApp. Self-hosting is the default posture; a managed cloud version exists if you’d rather not.

The interesting bit The Emacs and Obsidian integrations are unusual. Most “AI second brain” projects chase the Notion or Roam crowd; Khoj also speaks fluent org-mode and will ping you on WhatsApp. The agent builder lets you pin custom knowledge, personas, and tools to specific tasks—less “chatbot,” more “intern with a reading list.”

Key highlights

  • Plugs into local models (llama3, qwen, mistral, etc.) or API ones (GPT, Claude, Gemini, DeepSeek)
  • Reads PDFs, images, Markdown, Word docs, Notion exports, and org-mode files
  • Access via browser, Obsidian plugin, Emacs package, desktop app, phone, or WhatsApp
  • Custom agents with their own knowledge bases, personas, and tool access
  • Scheduled research automations: newsletters, notifications, recurring queries
  • Semantic search across your ingested documents
  • Image generation, voice I/O, and audio playback
  • Dockerized self-hosting with docs; cloud option at app.khoj.dev

Caveats

  • The README is heavy on feature lists and light on architecture or performance specifics; “excellent performance on modern benchmarks” is claimed but no numbers are shown
  • Enterprise and hybrid deployment details live off-repo on a marketing page
  • The project also promotes Pipali, a separate “AI coworker” repo, which suggests the boundary between Khoj and its sibling products is still being defined

Verdict Good fit if you want a single self-hosted backend that feeds LLM access into Emacs, Obsidian, and WhatsApp without writing glue code. Skip if you need transparent benchmarking data or a minimal, single-purpose tool—this is a broad kitchen-sink system.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.