← all repositories
nduckmink/arkon

Your company's brain, with an editor-in-chief

Arkon is a self-hosted knowledge hub that doesn't just chunk documents—it compiles them into a reviewed, versioned wiki and serves it to Claude through MCP.

964 stars Python RAG · SearchLLMOps · Eval
arkon
Velocity · 7d
+25
★ / day
Trend
steady
star history

What it does Arkon ingests your org’s PDFs, policies, and docs through a multi-stage “MRP pipeline” (Map → Reduce → Plan-review → Refine → Verify → Commit) that builds a structured, interlinked wiki rather than a loose bag of vector chunks. It then exposes that wiki to Claude and other LLMs via an MCP server with OAuth 2.1 + PKCE login, scoped by department and role.

The interesting bit The plan-review step is the unusual part: before any wiki page gets written, the system generates a human-reviewable plan showing exactly which pages will be created or updated. Editors can reject and regenerate. When sources overlap with existing pages, content gets LLM-merged rather than overwritten, so knowledge accumulates instead of colliding. The whole pipeline is resumable if a worker crashes mid-compilation.

Key highlights

  • Department and global scopes with hard isolation—HR sees HR, Engineering sees Engineering, everyone sees company-wide SOPs.
  • RBAC with audit logging built in (Viewer → Contributor → Editor → Admin), plus per-token scope enforcement at the MCP tool layer.
  • Online embedding migration—switch embedding models atomically without a zero-result search window.
  • 7 Docker containers (FastAPI, Next.js, PostgreSQL+pgvector, Redis, MinIO, 2 ARQ workers); no GPU needed since inference is external.
  • PolyForm Internal Use License—free for internal ops, no SaaS resale.

Caveats

  • Not for individuals. The README explicitly points personal users to Obsidian + Claude Skills.
  • RAM is the bottleneck. The MRP workers load large LLM context windows; 4 GB for starter, 8 GB for teams, 16+ GB for enterprise.
  • Several roadmap items are unchecked: rich media ingestion, external data connectors (SharePoint, Google Drive, Notion), CLI setup, and notifications are all pending.

Verdict Worth evaluating if you’re a 20+ person org already paying for Claude/GPT/Gemini and tired of employees copy-pasting inconsistent context into chatbots. Skip it if you’re a solo developer or want turnkey cloud SaaS—this is deliberately self-hosted, Docker-heavy, and team-oriented.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.