← all repositories
onyx-dot-app/onyx

Self-hosted AI platform that actually ships the boring parts

Onyx wraps chat, RAG, agents, and enterprise governance into one deployable stack with a one-line installer.

onyx
Velocity · 7d
+26
★ / day
Trend
steady
star history

What it does Onyx is a self-hostable “application layer for LLMs” — essentially a full-stack AI workbench you run yourself. It bundles chat, retrieval-augmented generation, custom agents, web search, code execution, voice, and image generation behind a single interface. The project also offers a slimmed-down “Lite” mode (under 1GB RAM) for teams that just want chat and agents without the indexing infrastructure.

The interesting bit The deployment story is unusually practical. One curl | bash gets you running, and the stack scales up to Kubernetes/Helm/Terraform with Redis, MinIO, and background job workers. The README is refreshingly direct about the split: Community Edition is MIT-licensed and covers core features, while Enterprise Edition adds SSO, SCIM, RBAC, analytics, and whitelabeling. No hand-waving about “open core” — the boundary is explicit.

Key highlights

  • 50+ built-in connectors for indexing external data, plus MCP support for custom integrations
  • “Agentic RAG” with hybrid vector + keyword indexing; claims top spot on a deep-research benchmark as of Feb 2026 (benchmark repo linked, though methodology not detailed)
  • Supports essentially every LLM backend: Ollama, vLLM, LiteLLM, Anthropic, OpenAI, Gemini, etc.
  • Code execution in sandboxed environments, artifact generation, and web crawling via in-house crawler or Firecrawl/Exa
  • Enterprise features: SAML/OIDC SSO, SCIM provisioning, query audit history, and custom PII-filtering code hooks

Caveats

  • The “best in class” RAG claim references a benchmark “to release soon” — currently unverified
  • Deep research benchmark leadership is self-reported from Feb 2026; no independent confirmation in the README
  • Standard mode requires significant infrastructure (vector DB, job queues, inference servers) — the Lite/Standard split is real and not just marketing

Verdict Teams that need a governed, self-hosted alternative to ChatGPT Enterprise or Copilot should evaluate this seriously. Solo developers or those already happy with a simple Ollama + Open WebUI setup will find the full stack overkill unless they need the connector ecosystem.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.